A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait

Gianola, Daniel; Wu, Xiao-Lin; Manfredi, Eduardo; Simianer, Henner

doi:10.1007/s10709-010-9478-4

A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait

Original Research
Published: 25 August 2010

Volume 138, pages 959–977, (2010)
Cite this article

Genetica Aims and scope Submit manuscript

Daniel Gianola^1,2,3,4,
Xiao-Lin Wu¹,
Eduardo Manfredi³ &
…
Henner Simianer⁴

307 Accesses
8 Citations
Explore all metrics

Abstract

A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some “baseline” family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a set of combinations of parameters of a given prior distribution, the effects of the prior dissipated when the number of replicate samples per genotype was increased. Thus, the Dirichlet process model seems to be useful for gauging the number of QTLs affecting the trait: if the number of clusters inferred is small, probably just a few QTLs code for the trait. If the number of clusters inferred is large, this may imply that standard parametric models based on the baseline distribution may suffice. However, priors may be influential, especially if sample size is not large and if only a few genotypic configurations have replicate phenotypes in the sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture models detect large effect QTL better than GBLUP and result in more accurate and persistent predictions

Article Open access 11 February 2016

Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals

Article Open access 08 December 2016

An Integrated Approach to Empirical Bayesian Whole Genome Prediction Modeling

Article 28 September 2015

References

Antoniak CE (1974) Mixtures of Dirichlet processes with applications to non-parametric problems. Ann Stat 2:1152–1174
Article Google Scholar
Bush CA, MacEachern SN (1996) A semiparametric Bayesian model for randomized block designs. Biometrika 83:275–285
Article Google Scholar
Cockerham CC (1954) An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present. Genetics 39:859–882
CAS PubMed Google Scholar
Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper and Row, New York
Google Scholar
Dahl DB (2006) Model-Based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Muller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, Cambridge
Google Scholar
De Los Campos G, Gianola D, and ROSA GJM (2009a) Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J Anim Sci 87:1883–1887
Article PubMed Google Scholar
De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K and COTES JM (2009b) Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182:375–385
Article Google Scholar
Dempster ER, Lerner IM (1950) Heritability of threshold characters. Genetics 35:212–236
CAS PubMed Google Scholar
Escobar MD (1994) Estimating normal means with a Dirichlet process prior. J Amer Statist Assoc 89:268–275
Article Google Scholar
Escobar MD, West M (1998) Computing non-parametric hierarchical models. In: Dey D, Müller P, Sinha D (eds) Practical nonparametric and semiparametric bayesian statistics. Springer, New York, pp 1–22
Google Scholar
Falconer DS (1965) The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet 29:51–76
Article Google Scholar
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Article Google Scholar
Foster SD, Verbyla AP, Pitchford WS (2007) Incorporating LASSO effects into a mixed model for QTL detection. J Agric Biol Environ Stat 12:300–314
Article Google Scholar
Gianola D, De Los Campos G (2008) Inferring genetic values for quantitative traits non-parametrically. Genet Res 90:525–540
Article CAS PubMed Google Scholar
Gianola D, Foulley JL (1983) Sire evaluation for ordered categorical data with a threshold model. Genet Sel Evol 15:201–223
Article Google Scholar
Gianola D, Simianer H (2006) A Thurstonian model for quantitative genetic analysis of ranks: a Bayesian approach. Genetics 174:1613–1624
Article PubMed Google Scholar
Gianola D, van Kaam JBCHM (2008) Reproducing kernel Hilbert spaces methods for genomic assisted prediction of quantitative traits. Genetics 178:2289–2303
Article PubMed Google Scholar
Gianola D, Perez-Enciso M, Toro MA (2003) On marker-assisted prediction of genetic value: beyond the ridge. Genetics 163:347–365
CAS PubMed Google Scholar
Gianola D, Fernando RL, Stella A (2006a) Genomic assisted prediction of genetic value with semi-parametric procedures. Genetics 173:1761–1776
Article CAS PubMed Google Scholar
Gianola D, Heringstad B, Ødegård J (2006b) On the quantitative genetics of mixture characters. Genetics 173:2247–2255
Article CAS PubMed Google Scholar
Gianola D, de Los Campos G, Hill WG, Manfredi E, Fernando RL (2009) Additive genetic variability and the Bayesian alphabet. Genetics (submitted)
González-recio O, Gianola D, Long N, Weigel KA, ROSA GJM, Avendaño S (2008) Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers. Genetics 178:2305–2313
Article PubMed Google Scholar
González-recio O, Gianola D, Rosa GJM, Weigel KA, Avendaño S (2009) Genome-assisted prediction of a quantitative trait in parents and progeny: application to food conversion rate in chickens. Genet Selection Evol (in press)
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443
Article CAS PubMed Google Scholar
Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108
Article CAS PubMed Google Scholar
Ibrahim JC, Kleinman KP (1998) Semiparametric Bayesian methods for random effects models. In: Dey D, Müller P, Sinha D (eds) Practical nonparametric and semiparametric Bayesian statistics. Springer, New York
Google Scholar
Jannink JL, Wu XL (2004) Estimating allelic number and identity in state of QTLs in interconnected families. Genet Res 81:133–144
Article Google Scholar
Kleinman KP, Ibrahim JG (1998) A semiparametric Bayesian approach to the random effects model. Biometrics 54:921–938
Article CAS PubMed Google Scholar
Lee HKH (2004) Bayesian nonparametrics via neural networks. ASA- SIAM, Philadelphia
Google Scholar
Long N, Gianola D, Rosa GJM, Weigel KA, Avendaño S (2007) Machine learning classification procedure for selecting SNP s in genomic selection: application to early mortality in broilers. J Anim Breed Genet 124:377–389
Article CAS PubMed Google Scholar
MacEachern SN (1994) Estimation of normal means with a conjugate style Dirichlet process prior. Comm Statist Sim 23:727–741
Article Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
CAS PubMed Google Scholar
Motsinger–Reif AA, Dudek SM, Hahn LW, Ritchie MD (2008) Comparison of approaches for machine learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet Epidemiol 32:325–340
Article PubMed Google Scholar
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686
Article CAS Google Scholar
Searle SR (1971) Linear models. Wiley, New York
Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York
Google Scholar
Templeton AR (2000) Epistasis and complex traits. In: Wolf JB et al. (ed) Epistasis and the evolutionary process. Oxford University Press, New York, pp 41–57
Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J Roy Stat Soc B 58:267–288
Google Scholar
van der Merwe AJ, Pretorius AL (2003) Bayesian estimation in animal breeding using the Dirichlet process prior for correlated random effects. Genet Sel Evol 35:137–158
Article PubMed Google Scholar
Van Raden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Article CAS Google Scholar
Wang CS, Rutledge JJ, Gianola D (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet Sel Evol 25:41–62
Article Google Scholar
Wang CS, Rutledge JJ, Gianola D (1994) Bayesian analysis of mixed linear models via Gibbs sampling with an application to litter size in Iberian pigs. Genet Sel Evol 26:91–115
Article Google Scholar
West M (1992) Hyperparameter estimation in Dirichlet process mixture models. Technical Report 92-A03, 6 pp, ISDS, Duke University
Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801
CAS PubMed Google Scholar
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055
Article CAS PubMed Google Scholar

Download references

Acknowledgments

Part of this work was carried out while the senior author was a Visiting Professor at Georg-August-Universität, Göttingen (Alexander von Humboldt Foundation Senior Researcher Award), and Visiting Scientist at the Station d’Amélioration Génétique des Animaux, Centre de Recherche de Toulouse (Chaire D’Excellence Pierre de Fermat, Agence Innovation, Midi-Pyreneés). Support by the Wisconsin Agriculture Experiment Station, and by grant NSF DMS-NSF DMS-044371 to the first and second authors is acknowledged. Aviagen Ltd. (Newbridge, Scotland) is thanked for providing the chicken data. A FORTRAN program for implementing the specific model described in the paper is available upon request to nick.wu@ansci.wisc.edu.

Author information

Authors and Affiliations

Department of Animal Sciences and Department of Dairy Science, University of Wisconsin-Madison, 1675 Observatory Dr, Madison, WI, 53706, USA
Daniel Gianola & Xiao-Lin Wu
Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, 1432, Ås, Norway
Daniel Gianola
Institut National de la Recherche Agronomique, UR631 Station d’Amélioration, Génétique des Animaux, BP 52627, 32326, Castanet-Tolosan, France
Daniel Gianola & Eduardo Manfredi
Department of Animal Sciences, Georg-August-Universität, Göttingen, Germany
Daniel Gianola & Henner Simianer

Authors

Daniel Gianola
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Lin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Manfredi
View author publications
You can also search for this author in PubMed Google Scholar
Henner Simianer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Gianola.

Appendices

Appendix A: Technical details on drawing genomic effects

Computing K _i is critical for implementing the DP methodology. When the baseline distribution is $N( g_{i}|0,\sigma_{g}^{2}) ,$ the integral in (10) represented as K _i is expressible in closed form, because the integrand is the product of two normal densities. Here, $ \varvec{\beta }$ and all $g^{\prime }s$ other than g _i enter as fixed parameters, and the latter follows the $N\left( g_{i}|0,\sigma _{g}^{2}\right) $ distribution. Standard integration yields

$$ \begin{aligned} K_{i}&=N\left( {\bf y|X\beta }+\sum\limits_{j\neq i}^{C}{\bf z} _{g_{j}}g_{j},{\bf V}_{i}\right) \\ &=\frac{1}{\left( 2\pi \right)^{\frac{n}{2}}\left\vert {\bf V} _{i}\right\vert ^{\frac{1}{2}}}\exp \left[ -\frac{\left({\bf y-X\beta}-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) ^{\prime }{\bf V }_{i}^{-1}\left( {\bf y-X\beta }-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right)}{2}\right] \end{aligned} $$

(26)

where ${\bf V}_{i}$ is the $n{\times}n$ matrix

$$ {\bf V}_{i}={\bf z}_{g_{i}}{\bf z}_{g_{i}}^{\prime }\sigma _{g}^{2}+ {\bf I}_{n}\sigma _{e}^{2}. $$

It is shown in “Appendix B” that the form of ${\bf V}_{i}$ (after rearrangement of observations, such that the first n _i records are those from individuals with configuration i) is

$$ {\bf V}_{i}=\left[ \begin{array}{cc} {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2} & {\bf 0} \\ {\bf 0} & {\bf I}_{n-n_{i}}\sigma _{e}^{2} \end{array}\right] , $$

where ${\bf J}_{n_{i}}$ is a matrix of ones of order $n_{i}\times n_{i}$, and n − n _i is the number of individuals with records that have genotypes other than i. Using results in Searle (1971)

$$ \begin{aligned} \left\vert {\bf V}_{i}\right\vert &=\left\vert {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2}\right\vert \left\vert {\bf I} _{n-n_{i}}\sigma _{e}^{2}\right\vert , \\ \left\vert {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2}\right\vert &=\sigma _{e}^{2\left( n_{i}-1\right) }\left( \sigma _{e}^{2}+n_{i}\sigma _{g}^{2}\right) , \\ \left\vert {\bf I}_{n-n_{i}}\sigma _{e}^{2}\right\vert &=\sigma _{e}^{2\left( n-n_{i}\right) }, \\ \end{aligned} $$

so that

$$ \left\vert {\bf V}_{i}\right\vert =\sigma _{e}^{2\left( n-1\right) }\left( \sigma _{e}^{2}+n_{i}\sigma _{g}^{2}\right) . $$

(27)

Likewise,

$$ \begin{aligned} {\bf V}_{i}^{-1} &=\left[\begin{array}{cc} {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2} & {\bf 0}\\ {\bf 0} & {\bf I}_{n-n_{i}}\sigma _{e}^{2} \end{array}\right] ^{-1} \\ &=\left[ \begin{array}{cc}\frac{1}{\sigma _{e}^{2}}{\bf I}_{n_{i}}+\frac{1}{n_{i}}\left( \frac{1}{ \sigma _{e}^{2}+n_{i}\sigma _{g}^{2}}-\frac{1}{\sigma _{e}^{2}}\right) {\bf J}_{n_{i}} & {\bf 0} \\ {\bf 0} & \frac{1}{\sigma _{e}^{2}}{\bf I}_{n-n_{i}} \end{array}\right] . \end{aligned} $$

(28)

Now, with the records arranged such that those of the n _i individuals with configuration i precede data points of the n − n _i individuals having a different genotype, let such rearrangement (indexed by i) lead to

$$ \left( {\bf y-X\beta }-\sum\limits_{j\neq i}^{C}{\bf z} _{g_{j}}g_{j}\right) _{i}=\left[ \begin{array}{c} {\bf z}_{y\in i}\left( {\bf y},\varvec{\beta },{\bf g}\right) \\ {\bf z}_{y\notin i}\left( {\bf y},\varvec{\beta },{\bf g}\right) \end{array}\right] , $$

where ${\bf z}_{y\in i}\left( {\bf y},\varvec{\beta },{\bf g} \right) $ denotes elements of $\left( {\bf y-X\beta }-\sum\limits_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) $ involving g _i, and ${\bf z} _{y\notin i}\left( {\bf y},\varvec{\beta },{\bf g}\right) $ indicates records in the complement. Using (27) and (28) in (26)

$$ \begin{aligned} K_{i} &=\frac{1}{\left( 2\pi \right)^{\frac{n}{2}}\left[ \sigma _{e}^{2\left( n-1\right) }\left(\sigma _{e}^{2}+n_{i}\sigma _{g}^{2}\right) \right] ^{\frac{1}{2}}} \\ &\times \exp \left[ -\frac{\left( {\bf y-X\beta } -\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right)_{i}^{\prime }{\bf V}_{i}^{-1}\left( {\bf y-X\beta }-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) _{i}}{2}\right]\\&=\frac{\exp \left[ -\frac{\left( {\bf y-X\beta }-\sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right) ^{\prime }\left( {\bf y-X\beta }- \sum_{j\neq i}^{C}{\bf z}_{g_{j}}g_{j}\right)}{2\sigma _{e}^{2}} \right] }{\left( 2\pi \right)^{\frac{n}{2}}\left[\sigma_{e}^{2\left( n-1\right)}\left( \sigma _{e}^{2}+n_{i}\sigma_{g}^{2}\right) \right] ^{ \frac{1}{2}}}\\ &\times \exp \left[-\frac{\frac{1}{n_{i}}\left( \frac{1}{\sigma _{e}^{2}+n_{i}\sigma_{g}^{2}}-\frac{1}{\sigma _{e}^{2}}\right) \left(\sum\limits_{y\in i,j=1}^{n}z_{y\in i,j}\left( {\bf y},\varvec{\beta },{\bf g}\right) \right)^{2}}{2}\right] ,\end{aligned} $$

(29)

where $z_{y\in i,j}\left( {\bf y},\varvec{\beta },{\bf g}\right) $ is element j of ${\bf z}_{y\in i}\left( {\bf y},\varvec{\beta },{\bf g}\right) .$ The form of K _i in (29) does not involve matrix inverses or matrix computations.

Appendix B: Matrix manipulations

Form of matrix${\bf V}_{i}.$ To see the pattern, suppose that n = 4, C = 5, and that $\varvec{\beta }={\bf 1}_{4}\mu $ (${\bf 1}_{4}$ is a $4{\times}1$ vector of ones), so that model (2) is

$$ \left[ \begin{array}{c} y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{array}\right] =\left[ \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}\right] \mu +\left[ \begin{array}{ccccc} 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array}\right] \left[ \begin{array}{c} g_{1} \\ g_{2} \\ g_{3} \\ g_{4} \\ g_{5} \end{array}\right] +\left[ \begin{array}{c} e_{1} \\ e_{2} \\ e_{3} \\ e_{4} \end{array}\right] . $$

Consider effect g₁, so that

$$ \begin{aligned} {\bf z}_{g_{1}}&=\left[ \begin{array}{c} 1 \\ 1 \\ 0 \\ 0 \end{array}\right] ,{\bf z}_{g_{1}}{\bf z}_{g_{1}}^{\prime } =&\left[ \begin{array}{ccccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] , \quad {\bf V}_{1}=\left[ \begin{array}{cccc} \sigma _{g}^{2}+\sigma _{e}^{2} & \sigma _{g}^{2} & 0 & 0 \\ \sigma _{g}^{2} & \sigma _{g}^{2}+\sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{1}^{-1} &=\left[ \begin{array}{ccc} \left(\begin{array}{cc}\sigma _{g}^{2}+\sigma _{e}^{2} & \sigma _{g}^{2} \\ \sigma _{g}^{2} & \sigma _{g}^{2}+\sigma _{e}^{2} \end{array}\right)^{-1} & {\bf 0} & {\bf 0} \\ {\bf 0} & \frac{1}{\sigma _{e}^{2}} & 0 \\ {\bf 0} & 0 & \frac{1}{\sigma _{e}^{2}} \end{array}\right] . \end{aligned} $$

For g₂

$$ \begin{aligned} {\bf z}_{g_{2}}&=\left[ \begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array}\right] ,{\bf z}_{g_{2}}{\bf z}_{g_{2}}^{\prime } =\left[ \begin{array}{ccccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] , \quad {\bf V}_{2}=\left[ \begin{array}{cccc} \sigma _{e}^{2} & 0 & 0 & 0 \\ 0 & \sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{g}^{2}+\sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{2}^{-1} &=\left[ \begin{array}{ccccc} \frac{1}{\sigma _{e}^{2}} & 0 & 0 & 0 \\ 0 & \frac{1}{\sigma _{e}^{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sigma _{g}^{2}+\sigma _{e}^{2}} & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma _{e}^{2}} \end{array}\right] . \end{aligned} $$

Likewise for $g_{3}\left( g_{4}\right) $

$$ \begin{aligned} {\bf z}_{g_{3}} &=\left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \end{array}\right] ,{\bf z}_{g_{3}}{\bf z}_{g_{3}}^{\prime }=\left[ \begin{array}{ccccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] , \quad {\bf V}_{3}=\left[ \begin{array}{cccc} \sigma _{e}^{2} & 0 & 0 & 0 \\ 0 & \sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{3}^{-1} &=\left[ \begin{array}{cccc} \frac{1}{\sigma _{e}^{2}}& 0 & 0 & 0 \\ 0 & \frac{1}{\sigma _{e}^{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sigma _{e}^{2}} & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma _{e}^{2}} \end{array}\right]. \end{aligned} $$

Finally,

$$ \begin{aligned} {\bf z}_{g_{5}} &=\left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array}\right] ,{\bf z}_{g_{5}}{\bf z}_{g_{5}}^{\prime }=\left[ \begin{array}{cccc} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right] , \quad {\bf V}_{5}=\left[ \begin{array}{cccc} \sigma _{e}^{2} & 0 & 0 & 0 \\ 0 & \sigma _{e}^{2} & 0 & 0 \\ 0 & 0 & \sigma _{e}^{2} & 0 \\ 0 & 0 & 0 & \sigma _{g}^{2}+\sigma _{e}^{2} \end{array}\right] , \\ {\bf V}_{5}^{-1} &=\left[\begin{array}{cccc} \frac{1}{\sigma _{e}^{2}} & 0 & 0 & 0 \\ 0 & \frac{1}{\sigma _{e}^{2}} & 0 & 0 \\ 0 & 0 & \frac{1}{\sigma _{e}^{2}} & 0 \\ 0 & 0 & 0 & \frac{1}{\sigma _{g}^{2}+\sigma _{e}^{2}} \end{array}\right]. \end{aligned} $$

The general form of ${\bf V}_{i}$ (after rearrangement of observations, such that the first n _i records are those from individuals with configuration i) is

$$ {\bf V}_{i}=\left[ \begin{array}{cc} {\bf J}_{n_{i}}\sigma _{g}^{2}+{\bf I}_{n_{i}}\sigma _{e}^{2} & {\bf 0} \\ {\bf 0} & {\bf I}_{n-n_{i}}\sigma _{e}^{2} \end{array}\right] , $$

where ${\bf J}_{n_{i}}$ is an $n_{i}{\times}n_{i}$ matrix of ones, and $ {\bf I}_{n_{i}}$ and ${\bf I}_{n-n_{i}}$ are identity matrices of order n _i and n − n _i, respectively.

Appendix C: Conditional posterior distribution of M

The normalized density (22) is

$$ \begin{aligned} &p\left( M|x,t,ELSE\right) \\ &\quad =\frac{M^{t+a-1}\exp \left[ -M\left( b-\log x\right) \right] +CM^{t+a-2}\exp \left[ -M\left( b-\log x\right) \right] } { \int_{0}^{\infty }\left\{ M^{t+a-1}\exp \left[ -M\left( b-\log x\right) \right] +CM^{t+a-2}\exp \left[ -M\left( b-\log x\right) \right] \right\} dM}. \end{aligned} $$

The integrals in the denominator yield

$$ \begin{aligned} \int\limits_{0}^{\infty }M^{t+a-1}\exp \left[ -M\left( b-\log x\right) \right] dM &=\frac{\Upgamma \left( t+a\right) } {\left( b-\log x\right) ^{t+a}} \\ \int\limits_{0}^{\infty }CM^{t+a-2}\exp \left[ -M\left( b-\log x\right) \right] dM &= \frac{C\Upgamma \left( t+a-1\right) } {\left( b-\log x\right)^{t+a-1}}. \end{aligned} $$

Letting $a^{{\ast}}=(t+a)$ and $b^{\ast }=\left( b-\log x\right) $ the conditional posterior distribution becomes

$$ \begin{aligned} &p\left( M|x,t,ELSE\right) \\ &\quad =\frac{M^{a^{\ast }-1}\exp \left( -Mb^{\ast }\right) +CM^{a^{\ast }-1-1}\exp \left( -Mb^{\ast }\right) }{\frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}+\frac{C\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}} \\ &\quad =\frac{\frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}\frac{ b^{\ast a^{\ast }}}{\Upgamma \left( a^{\ast }\right) }M^{a^{\ast }-1}\exp \left( -Mb^{\ast }\right) +C\frac{\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}\frac{b^{\ast a^{\ast }-1}}{\Upgamma \left( a^{\ast }-1\right) } M^{a^{\ast }-1-1}\exp \left( -Mb^{\ast }\right) } {\frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}+\frac{C\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}} \\ &\quad =\pi _{x}Gamma\left( a^{\ast },b^{\ast }\right) +\left( 1-\pi _{x}\right) Gamma\left( a^{\ast }-1,b^{\ast }\right) . \end{aligned} $$

(30)

This is a mixture of Gamma distributions indicated, with mixing probabilities ${\pi}_{x}$ and $1-{\pi}_{x}.$ Note that

$$ \pi _{x}=\frac{\frac{\Upgamma \left( a^{\ast }\right) }{b^{\ast a^{\ast }}}} { \frac{\Upgamma \left( a^{\ast }\right) } {b^{\ast a^{\ast }}}+\frac{C\Upgamma \left( a^{\ast }-1\right) } {b^{\ast a^{\ast }-1}}}=\frac{\Upgamma \left( a^{\ast }\right) } {\Upgamma \left( a^{\ast }\right) +Cb^{\ast }\Upgamma \left( a^{\ast }-1\right) }. $$

Since $\Upgamma \left( a^{\ast }\right) =\left( a^{\ast }-1\right) \Upgamma \left( a^{\ast }-1\right) ,$

$$ \pi _{x}=\frac{a^{\ast }-1} {a^{\ast }-1+Cb^{\ast }} $$

$$ 1-\pi _{x}=\frac{Cb^{\ast }}{a^{\ast }-1+Cb^{\ast }}. $$

At the end of MCMC sampling there will be S, say, samples of the number of clusters t and of the auxiliary variables x. The density of the marginal posterior distribution of M can be estimated (West 1992) using the Rao-Blackwell estimator

$$ \begin{aligned} & p\left( M|ELSE\right)\\ &\quad =\frac{1} {S}\sum\limits_{s=1}^{S}p\left( M|x^{\left( s\right) },t^{\left( s\right) },ELSE\right) \\ &\quad =\frac{1} {S}\sum\limits_{s=1}^{S}\left[ \pi _{x^{\left( s\right) }}Gamma\left( a^{\ast \left( s\right) },b^{\ast \left( s\right) }\right) +\left( 1-\pi _{x^{\left( s\right) }}\right) Gamma\left( a^{\ast \left( s\right) }-1,b^{\ast \left( s\right) }\right) \right]. \end{aligned} $$

(31)

where $a^{\ast \left( s\right) }=t^{\left( s\right) }+a,$ and $b^{\ast \left( s\right) }=b-\log x^{\left( s\right) }$. If a = 0 and b = 0, the Gamma prior degenerates to $p(M) \varpropto M^{-1}$ or, equivalently, to $p\left[ \log \left( M\right) \right] \varpropto $ constant. If such improper prior is adopted, the Rao-Blackwell estimator reduces to

$$ \begin{aligned} p&\left( M|ELSE\right) \\ &\quad =\frac{1}{S}\sum\limits_{s=1}^{S}\left[ \pi _{x^{(s) }}Gamma\left( t^{(s)},-\log x^{(s)}\right) +\left( 1-\pi _{x^{(s)}}\right) Gamma\left( t^{\left( s\right) }-1,-\log x^{\left( s\right) }\right) \right] , \end{aligned} $$

with a* = t and b* = −logx used in the calculation of $\pi _{x^{\left( s\right) }}.$ The conditional posterior distribution of M is well defined for a = b = 0, but this does not guarantee that its marginal posterior distribution will be always proper. Hence, the uniform prior on $ {\log}(M) $ should be used cautiously.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gianola, D., Wu, XL., Manfredi, E. et al. A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait. Genetica 138, 959–977 (2010). https://doi.org/10.1007/s10709-010-9478-4

Download citation

Received: 19 February 2010
Accepted: 29 July 2010
Published: 25 August 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s10709-010-9478-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait

Abstract

Access this article

Similar content being viewed by others

Mixture models detect large effect QTL better than GBLUP and result in more accurate and persistent predictions

Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals

An Integrated Approach to Empirical Bayesian Whole Genome Prediction Modeling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Technical details on drawing genomic effects

Appendix B: Matrix manipulations

Appendix C: Conditional posterior distribution of M

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait

Abstract

Access this article

Similar content being viewed by others

Mixture models detect large effect QTL better than GBLUP and result in more accurate and persistent predictions

Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals

An Integrated Approach to Empirical Bayesian Whole Genome Prediction Modeling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Technical details on drawing genomic effects

Appendix B: Matrix manipulations

Appendix C: Conditional posterior distribution of M

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation