Skip to main content
Log in

A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait

  • Original Research
  • Published:
Genetica Aims and scope Submit manuscript

Abstract

A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some “baseline” family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a set of combinations of parameters of a given prior distribution, the effects of the prior dissipated when the number of replicate samples per genotype was increased. Thus, the Dirichlet process model seems to be useful for gauging the number of QTLs affecting the trait: if the number of clusters inferred is small, probably just a few QTLs code for the trait. If the number of clusters inferred is large, this may imply that standard parametric models based on the baseline distribution may suffice. However, priors may be influential, especially if sample size is not large and if only a few genotypic configurations have replicate phenotypes in the sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Antoniak CE (1974) Mixtures of Dirichlet processes with applications to non-parametric problems. Ann Stat 2:1152–1174

    Article  Google Scholar 

  • Bush CA, MacEachern SN (1996) A semiparametric Bayesian model for randomized block designs. Biometrika 83:275–285

    Article  Google Scholar 

  • Cockerham CC (1954) An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39:859–882

    CAS  PubMed  Google Scholar 

  • Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper and Row, New York

    Google Scholar 

  • Dahl DB (2006) Model-Based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Muller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge

    Google Scholar 

  • De Los Campos G, Gianola D, and ROSA GJM (2009a) Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J Anim Sci 87:1883–1887

    Article  PubMed  Google Scholar 

  • De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K and COTES JM (2009b) Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182:375–385

    Article  Google Scholar 

  • Dempster ER, Lerner IM (1950) Heritability of threshold characters. Genetics 35:212–236

    CAS  PubMed  Google Scholar 

  • Escobar MD (1994) Estimating normal means with a Dirichlet process prior. J Amer Statist Assoc 89:268–275

    Article  Google Scholar 

  • Escobar MD, West M (1998) Computing non-parametric hierarchical models. In: Dey D, Müller P, Sinha D (eds) Practical nonparametric and semiparametric bayesian statistics. Springer, New York, pp 1–22

    Google Scholar 

  • Falconer DS (1965) The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet 29:51–76

    Article  Google Scholar 

  • Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230

    Article  Google Scholar 

  • Foster SD, Verbyla AP, Pitchford WS (2007) Incorporating LASSO effects into a mixed model for QTL detection. J Agric Biol Environ Stat 12:300–314

    Article  Google Scholar 

  • Gianola D, De Los Campos G (2008) Inferring genetic values for quantitative traits non-parametrically. Genet Res 90:525–540

    Article  CAS  PubMed  Google Scholar 

  • Gianola D, Foulley JL (1983) Sire evaluation for ordered categorical data with a threshold model. Genet Sel Evol 15:201–223

    Article  Google Scholar 

  • Gianola D, Simianer H (2006) A Thurstonian model for quantitative genetic analysis of ranks: a Bayesian approach. Genetics 174:1613–1624

    Article  PubMed  Google Scholar 

  • Gianola D, van Kaam JBCHM (2008) Reproducing kernel Hilbert spaces methods for genomic assisted prediction of quantitative traits. Genetics 178:2289–2303

    Article  PubMed  Google Scholar 

  • Gianola D, Perez-Enciso M, Toro MA (2003) On marker-assisted prediction of genetic value: beyond the ridge. Genetics 163:347–365

    CAS  PubMed  Google Scholar 

  • Gianola D, Fernando RL, Stella A (2006a) Genomic assisted prediction of genetic value with semi-parametric procedures. Genetics 173:1761–1776

    Article  CAS  PubMed  Google Scholar 

  • Gianola D, Heringstad B, Ødegård J (2006b) On the quantitative genetics of mixture characters. Genetics 173:2247–2255

    Article  CAS  PubMed  Google Scholar 

  • Gianola D, de Los Campos G, Hill WG, Manfredi E, Fernando RL (2009) Additive genetic variability and the Bayesian alphabet. Genetics (submitted)

  • González-recio O, Gianola D, Long N, Weigel KA, ROSA GJM, Avendaño S (2008) Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178:2305–2313

    Article  PubMed  Google Scholar 

  • González-recio O, Gianola D, Rosa GJM, Weigel KA, Avendaño S (2009) Genome-assisted prediction of a quantitative trait in parents and progeny: application to food conversion rate in chickens. Genet Selection Evol (in press)

  • Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443

    Article  CAS  PubMed  Google Scholar 

  • Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108

    Article  CAS  PubMed  Google Scholar 

  • Ibrahim JC, Kleinman KP (1998) Semiparametric Bayesian methods for random effects models. In: Dey D, Müller P, Sinha D (eds) Practical nonparametric and semiparametric Bayesian statistics. Springer, New York

    Google Scholar 

  • Jannink JL, Wu XL (2004) Estimating allelic number and identity in state of QTLs in interconnected families. Genet Res 81:133–144

    Article  Google Scholar 

  • Kleinman KP, Ibrahim JG (1998) A semiparametric Bayesian approach to the random effects model. Biometrics 54:921–938

    Article  CAS  PubMed  Google Scholar 

  • Lee HKH (2004) Bayesian nonparametrics via neural networks. ASA- SIAM, Philadelphia

    Google Scholar 

  • Long N, Gianola D, Rosa GJM, Weigel KA, Avendaño S (2007) Machine learning classification procedure for selecting SNP s in genomic selection: application to early mortality in broilers. J Anim Breed Genet 124:377–389

    Article  CAS  PubMed  Google Scholar 

  • MacEachern SN (1994) Estimation of normal means with a conjugate style Dirichlet process prior. Comm Statist Sim 23:727–741

    Article  Google Scholar 

  • Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    CAS  PubMed  Google Scholar 

  • Motsinger–Reif AA, Dudek SM, Hahn LW, Ritchie MD (2008) Comparison of approaches for machine learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet Epidemiol 32:325–340

    Article  PubMed  Google Scholar 

  • Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686

    Article  CAS  Google Scholar 

  • Searle SR (1971) Linear models. Wiley, New York

    Google Scholar 

  • Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York

    Google Scholar 

  • Templeton AR (2000) Epistasis and complex traits. In: Wolf JB et al. (ed) Epistasis and the evolutionary process. Oxford University Press, New York, pp 41–57

    Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J Roy Stat Soc B 58:267–288

    Google Scholar 

  • van der Merwe AJ, Pretorius AL (2003) Bayesian estimation in animal breeding using the Dirichlet process prior for correlated random effects. Genet Sel Evol 35:137–158

    Article  PubMed  Google Scholar 

  • Van Raden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423

    Article  CAS  Google Scholar 

  • Wang CS, Rutledge JJ, Gianola D (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet Sel Evol 25:41–62

    Article  Google Scholar 

  • Wang CS, Rutledge JJ, Gianola D (1994) Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet Sel Evol 26:91–115

    Article  Google Scholar 

  • West M (1992) Hyperparameter estimation in Dirichlet process mixture models. Technical Report 92-A03, 6 pp, ISDS, Duke University

  • Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801

    CAS  PubMed  Google Scholar 

  • Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

Part of this work was carried out while the senior author was a Visiting Professor at Georg-August-Universität, Göttingen (Alexander von Humboldt Foundation Senior Researcher Award), and Visiting Scientist at the Station d’Amélioration Génétique des Animaux, Centre de Recherche de Toulouse (Chaire D’Excellence Pierre de Fermat, Agence Innovation, Midi-Pyreneés). Support by the Wisconsin Agriculture Experiment Station, and by grant NSF DMS-NSF DMS-044371 to the first and second authors is acknowledged. Aviagen Ltd. (Newbridge, Scotland) is thanked for providing the chicken data. A FORTRAN program for implementing the specific model described in the paper is available upon request to nick.wu@ansci.wisc.edu.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Gianola.

Appendices

Appendix A: Technical details on drawing genomic effects

Computing K i is critical for implementing the DP methodology. When the baseline distribution is \(N( g_{i}|0,\sigma_{g}^{2}) ,\) the integral in (10) represented as K i is expressible in closed form, because the integrand is the product of two normal densities. Here, \( \varvec{\beta }\) and all \(g^{\prime }s\) other than g i enter as fixed parameters, and the latter follows the \(N\left( g_{i}|0,\sigma _{g}^{2}\right) \) distribution. Standard integration yields

$$ \begin{aligned} K_{i}&=N\left( {\bf y|X\beta }+\sum\limits_{j\neq i}^{C}{\bf z} _{g_{j}}g_{j},{\bf V}_{i}\right) \\ &=\frac{1}{\left( 2\pi \right)^{\frac{n}{2}}\left\vert {\bf V} _{i}\right\vert ^{\frac{1}{2}}}\exp \left[ -\frac{\left({\bf y-X\beta}-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) ^{\prime }{\bf V }_{i}^{-1}\left( {\bf y-X\beta }-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right)}{2}\right] \end{aligned} $$
(26)

where \({\bf V}_{i}\) is the \(n{\times}n\) matrix

$$ {\bf V}_{i}={\bf z}_{g_{i}}{\bf z}_{g_{i}}^{\prime }\sigma _{g}^{2}+ {\bf I}_{n}\sigma _{e}^{2}. $$

It is shown in “Appendix B” that the form of \({\bf V}_{i}\) (after rearrangement of observations, such that the first n i records are those from individuals with configuration i) is

$$ {\bf V}_{i}=\left[ \begin{array}{cc} {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2} & {\bf 0} \\ {\bf 0} & {\bf I}_{n-n_{i}}\sigma _{e}^{2} \end{array}\right] , $$

where \({\bf J}_{n_{i}}\) is a matrix of ones of order \(n_{i}\times n_{i}\), and n − n i is the number of individuals with records that have genotypes other than i. Using results in Searle (1971)

$$ \begin{aligned} \left\vert {\bf V}_{i}\right\vert &=\left\vert {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2}\right\vert \left\vert {\bf I} _{n-n_{i}}\sigma _{e}^{2}\right\vert , \\ \left\vert {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2}\right\vert &=\sigma _{e}^{2\left( n_{i}-1\right) }\left( \sigma _{e}^{2}+n_{i}\sigma _{g}^{2}\right) , \\ \left\vert {\bf I}_{n-n_{i}}\sigma _{e}^{2}\right\vert &=\sigma _{e}^{2\left( n-n_{i}\right) }, \\ \end{aligned} $$

so that

$$ \left\vert {\bf V}_{i}\right\vert =\sigma _{e}^{2\left( n-1\right) }\left( \sigma _{e}^{2}+n_{i}\sigma _{g}^{2}\right) . $$
(27)

Likewise,

$$ \begin{aligned} {\bf V}_{i}^{-1} &=\left[\begin{array}{cc} {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2} & {\bf 0}\\ {\bf 0} & {\bf I}_{n-n_{i}}\sigma _{e}^{2} \end{array}\right] ^{-1} \\ &=\left[ \begin{array}{cc}\frac{1}{\sigma _{e}^{2}}{\bf I}_{n_{i}}+\frac{1}{n_{i}}\left( \frac{1}{ \sigma _{e}^{2}+n_{i}\sigma _{g}^{2}}-\frac{1}{\sigma _{e}^{2}}\right) {\bf J}_{n_{i}} & {\bf 0} \\ {\bf 0} & \frac{1}{\sigma _{e}^{2}}{\bf I}_{n-n_{i}} \end{array}\right] . \end{aligned} $$
(28)

Now, with the records arranged such that those of the n i individuals with configuration i precede data points of the n − n i individuals having a different genotype, let such rearrangement (indexed by i) lead to

$$ \left( {\bf y-X\beta }-\sum\limits_{j\neq i}^{C}{\bf z} _{g_{j}}g_{j}\right) _{i}=\left[ \begin{array}{c} {\bf z}_{y\in i}\left( {\bf y},\varvec{\beta },{\bf g}\right) \\ {\bf z}_{y\notin i}\left( {\bf y},\varvec{\beta },{\bf g}\right) \end{array}\right] , $$

where \({\bf z}_{y\in i}\left( {\bf y},\varvec{\beta },{\bf g} \right) \) denotes elements of \(\left( {\bf y-X\beta }-\sum\limits_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) \) involving g i , and \({\bf z} _{y\notin i}\left( {\bf y},\varvec{\beta },{\bf g}\right) \) indicates records in the complement. Using (27) and (28) in (26)

$$ \begin{aligned} K_{i} &=\frac{1}{\left( 2\pi \right)^{\frac{n}{2}}\left[ \sigma _{e}^{2\left( n-1\right) }\left(\sigma _{e}^{2}+n_{i}\sigma _{g}^{2}\right) \right] ^{\frac{1}{2}}} \\ &\times \exp \left[ -\frac{\left( {\bf y-X\beta } -\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right)_{i}^{\prime }{\bf V}_{i}^{-1}\left( {\bf y-X\beta }-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) _{i}}{2}\right]\\&=\frac{\exp \left[ -\frac{\left( {\bf y-X\beta }-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) ^{\prime }\left( {\bf y-X\beta }- \sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right)}{2\sigma _{e}^{2}} \right] }{\left( 2\pi \right)^{\frac{n}{2}}\left[\sigma_{e}^{2\left( n-1\right)}\left( \sigma _{e}^{2}+n_{i}\sigma_{g}^{2}\right) \right] ^{ \frac{1}{2}}}\\ &\times \exp \left[-\frac{\frac{1}{n_{i}}\left( \frac{1}{\sigma _{e}^{2}+n_{i}\sigma_{g}^{2}}-\frac{1}{\sigma _{e}^{2}}\right) \left(\sum\limits_{y\in i,j=1}^{n}z_{y\in i,j}\left( {\bf y},\varvec{\beta },{\bf g}\right) \right)^{2}}{2}\right] ,\end{aligned} $$
(29)

where \(z_{y\in i,j}\left( {\bf y},\varvec{\beta },{\bf g}\right) \) is element j of \({\bf z}_{y\in i}\left( {\bf y},\varvec{\beta },{\bf g}\right) .\) The form of K i in (29) does not involve matrix inverses or matrix computations.

Appendix B: Matrix manipulations

Form of matrix\({\bf V}_{i}.\) To see the pattern, suppose that n = 4, C = 5, and that \(\varvec{\beta }={\bf 1}_{4}\mu \) (\({\bf 1}_{4}\) is a \(4{\times}1\) vector of ones), so that model (2) is

$$ \left[ \begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{array}\right] =\left[ \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}\right] \mu +\left[ \begin{array}{ccccc} 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right] \left[ \begin{array}{c} g_{1} \\ g_{2} \\ g_{3} \\ g_{4} \\ g_{5} \end{array}\right] +\left[ \begin{array}{c} e_{1} \\ e_{2} \\ e_{3} \\ e_{4} \end{array}\right] . $$

Consider effect g1, so that

$$ \begin{aligned} {\bf z}_{g_{1}}&=\left[ \begin{array}{c} 1 \\ 1 \\ 0 \\ 0 \end{array}\right] ,{\bf z}_{g_{1}}{\bf z}_{g_{1}}^{\prime } =&\left[ \begin{array}{ccccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] , \quad {\bf V}_{1}=\left[ \begin{array}{cccc} \sigma _{g}^{2}+\sigma _{e}^{2} & \sigma _{g}^{2} & 0 & 0 \\ \sigma _{g}^{2} & \sigma _{g}^{2}+\sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{1}^{-1} &=\left[ \begin{array}{ccc} \left(\begin{array}{cc}\sigma _{g}^{2}+\sigma _{e}^{2} & \sigma _{g}^{2} \\ \sigma _{g}^{2} & \sigma _{g}^{2}+\sigma _{e}^{2} \end{array}\right)^{-1} & {\bf 0} & {\bf 0} \\ {\bf 0} & \frac{1}{\sigma _{e}^{2}} & 0 \\ {\bf 0} & 0 & \frac{1}{\sigma _{e}^{2}} \end{array}\right] . \end{aligned} $$

For g2

$$ \begin{aligned} {\bf z}_{g_{2}}&=\left[ \begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array}\right] ,{\bf z}_{g_{2}}{\bf z}_{g_{2}}^{\prime } =\left[ \begin{array}{ccccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] , \quad {\bf V}_{2}=\left[ \begin{array}{cccc} \sigma _{e}^{2} & 0 & 0 & 0 \\ 0 & \sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{g}^{2}+\sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{2}^{-1} &=\left[ \begin{array}{ccccc} \frac{1}{\sigma _{e}^{2}} & 0 & 0 & 0 \\ 0 & \frac{1}{\sigma _{e}^{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sigma _{g}^{2}+\sigma _{e}^{2}} & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma _{e}^{2}} \end{array}\right] . \end{aligned} $$

Likewise for \(g_{3}\left( g_{4}\right) \)

$$ \begin{aligned} {\bf z}_{g_{3}} &=\left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \end{array}\right] ,{\bf z}_{g_{3}}{\bf z}_{g_{3}}^{\prime }=\left[ \begin{array}{ccccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] , \quad {\bf V}_{3}=\left[ \begin{array}{cccc} \sigma _{e}^{2} & 0 & 0 & 0 \\ 0 & \sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{3}^{-1} &=\left[ \begin{array}{cccc} \frac{1}{\sigma _{e}^{2}}& 0 & 0 & 0 \\ 0 & \frac{1}{\sigma _{e}^{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sigma _{e}^{2}} & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma _{e}^{2}} \end{array}\right]. \end{aligned} $$

Finally,

$$ \begin{aligned} {\bf z}_{g_{5}} &=\left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array}\right] ,{\bf z}_{g_{5}}{\bf z}_{g_{5}}^{\prime }=\left[ \begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right] , \quad {\bf V}_{5}=\left[ \begin{array}{cccc} \sigma _{e}^{2} & 0 & 0 & 0 \\ 0 & \sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{g}^{2}+\sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{5}^{-1} &=\left[\begin{array}{cccc} \frac{1}{\sigma _{e}^{2}} & 0 & 0 & 0 \\ 0 & \frac{1}{\sigma _{e}^{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sigma _{e}^{2}} & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma _{g}^{2}+\sigma _{e}^{2}} \end{array}\right]. \end{aligned} $$

The general form of \({\bf V}_{i}\) (after rearrangement of observations, such that the first n i records are those from individuals with configuration i) is

$$ {\bf V}_{i}=\left[ \begin{array}{cc} {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2} & {\bf 0} \\ {\bf 0} & {\bf I}_{n-n_{i}}\sigma _{e}^{2} \end{array}\right] , $$

where \({\bf J}_{n_{i}}\) is an \(n_{i}{\times}n_{i}\) matrix of ones, and \( {\bf I}_{n_{i}}\) and \({\bf I}_{n-n_{i}}\) are identity matrices of order n i and n − n i , respectively.

Appendix C: Conditional posterior distribution of M

The normalized density (22) is

$$ \begin{aligned} &p\left( M|x,t,ELSE\right) \\ &\quad =\frac{M^{t+a-1}\exp \left[ -M\left( b-\log x\right) \right] +CM^{t+a-2}\exp \left[ -M\left( b-\log x\right) \right] } { \int_{0}^{\infty }\left\{ M^{t+a-1}\exp \left[ -M\left( b-\log x\right) \right] +CM^{t+a-2}\exp \left[ -M\left( b-\log x\right) \right] \right\} dM}. \end{aligned} $$

The integrals in the denominator yield

$$ \begin{aligned} \int\limits_{0}^{\infty }M^{t+a-1}\exp \left[ -M\left( b-\log x\right) \right] dM &=\frac{\Upgamma \left( t+a\right) } {\left( b-\log x\right) ^{t+a}} \\ \int\limits_{0}^{\infty }CM^{t+a-2}\exp \left[ -M\left( b-\log x\right) \right] dM &= \frac{C\Upgamma \left( t+a-1\right) } {\left( b-\log x\right)^{t+a-1}}. \end{aligned} $$

Letting \(a^{{\ast}}=(t+a)\) and \(b^{\ast }=\left( b-\log x\right) \) the conditional posterior distribution becomes

$$ \begin{aligned} &p\left( M|x,t,ELSE\right) \\ &\quad =\frac{M^{a^{\ast }-1}\exp \left( -Mb^{\ast }\right) +CM^{a^{\ast }-1-1}\exp \left( -Mb^{\ast }\right) }{\frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}+\frac{C\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}} \\ &\quad =\frac{\frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}\frac{ b^{\ast a^{\ast }}}{\Upgamma \left( a^{\ast }\right) }M^{a^{\ast }-1}\exp \left( -Mb^{\ast }\right) +C\frac{\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}\frac{b^{\ast a^{\ast }-1}}{\Upgamma \left( a^{\ast }-1\right) } M^{a^{\ast }-1-1}\exp \left( -Mb^{\ast }\right) } {\frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}+\frac{C\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}} \\ &\quad =\pi _{x}Gamma\left( a^{\ast },b^{\ast }\right) +\left( 1-\pi _{x}\right) Gamma\left( a^{\ast }-1,b^{\ast }\right) . \end{aligned} $$
(30)

This is a mixture of Gamma distributions indicated, with mixing probabilities \({\pi}_{x}\) and \(1-{\pi}_{x}.\) Note that

$$ \pi _{x}=\frac{\frac{\Upgamma \left( a^{\ast }\right) }{b^{\ast a^{\ast }}}} { \frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}+\frac{C\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}}=\frac{\Upgamma \left( a^{\ast }\right) } {\Upgamma \left( a^{\ast }\right) +Cb^{\ast }\Upgamma \left( a^{\ast }-1\right) }. $$

Since \(\Upgamma \left( a^{\ast }\right) =\left( a^{\ast }-1\right) \Upgamma \left( a^{\ast }-1\right) ,\)

$$ \pi _{x}=\frac{a^{\ast }-1} {a^{\ast }-1+Cb^{\ast }} $$
$$ 1-\pi _{x}=\frac{Cb^{\ast }}{a^{\ast }-1+Cb^{\ast }}. $$

At the end of MCMC sampling there will be S, say, samples of the number of clusters t and of the auxiliary variables x. The density of the marginal posterior distribution of M can be estimated (West 1992) using the Rao-Blackwell estimator

$$ \begin{aligned} & p\left( M|ELSE\right)\\ &\quad =\frac{1} {S}\sum\limits_{s=1}^{S}p\left( M|x^{\left( s\right) },t^{\left( s\right) },ELSE\right) \\ &\quad =\frac{1} {S}\sum\limits_{s=1}^{S}\left[ \pi _{x^{\left( s\right) }}Gamma\left( a^{\ast \left( s\right) },b^{\ast \left( s\right) }\right) +\left( 1-\pi _{x^{\left( s\right) }}\right) Gamma\left( a^{\ast \left( s\right) }-1,b^{\ast \left( s\right) }\right) \right]. \end{aligned} $$
(31)

where \(a^{\ast \left( s\right) }=t^{\left( s\right) }+a,\) and \(b^{\ast \left( s\right) }=b-\log x^{\left( s\right) }\). If a = 0 and b = 0, the Gamma prior degenerates to \(p(M) \varpropto M^{-1}\) or, equivalently, to \(p\left[ \log \left( M\right) \right] \varpropto \) constant. If such improper prior is adopted, the Rao-Blackwell estimator reduces to

$$ \begin{aligned} p&\left( M|ELSE\right) \\ &\quad =\frac{1}{S}\sum\limits_{s=1}^{S}\left[ \pi _{x^{(s) }}Gamma\left( t^{(s)},-\log x^{(s)}\right) +\left( 1-\pi _{x^{(s)}}\right) Gamma\left( t^{\left( s\right) }-1,-\log x^{\left( s\right) }\right) \right] , \end{aligned} $$

with a* = t and b* =  −logx used in the calculation of \(\pi _{x^{\left( s\right) }}.\) The conditional posterior distribution of M is well defined for a = b = 0, but this does not guarantee that its marginal posterior distribution will be always proper. Hence, the uniform prior on \( {\log}(M) \) should be used cautiously.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gianola, D., Wu, XL., Manfredi, E. et al. A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait. Genetica 138, 959–977 (2010). https://doi.org/10.1007/s10709-010-9478-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10709-010-9478-4

Keywords

Navigation