Bayesian model averaging for evaluation of candidate gene effects

Wu, Xiao-Lin; Gianola, Daniel; Rosa, Guilherme J. M.; Weigel, Kent A.

doi:10.1007/s10709-009-9433-4

Bayesian model averaging for evaluation of candidate gene effects

Published: 05 January 2010

Volume 138, pages 395–407, (2010)
Cite this article

Genetica Aims and scope Submit manuscript

Xiao-Lin Wu^1,2,
Daniel Gianola^1,2,3,4,
Guilherme J. M. Rosa¹ &
…
Kent A. Weigel¹

151 Accesses
4 Citations
Explore all metrics

Abstract

Statistical assessment of candidate gene effects can be viewed as a problem of variable selection and model comparison. Given a certain number of genes to be considered, many possible models may fit to the data well, each including a specific set of gene effects and possibly their interactions. The question arises as to which of these models is most plausible. Inference about candidate gene effects based on a specific model ignores uncertainty about model choice. Here, a Bayesian model averaging approach is proposed for evaluation of candidate gene effects. The method is implemented through simultaneous sampling of multiple models. By averaging over a set of competing models, the Bayesian model averaging approach incorporates model uncertainty into inferences about candidate gene effects. Features of the method are demonstrated using a simulated data set with ten candidate genes under consideration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian Methods Applied to GWAS

Hierarchical inference for genome-wide association studies: a view on methodology with software

Article 06 January 2020

Application of Bayesian genomic prediction methods to genome-wide association analyses

Article Open access 13 May 2022

References

Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Book Google Scholar
Carlin B, Chib S (1995) Bayesian model choice via Markov Chain Monte Carlo methods. J Roy Stat Soc Ser B 57:473–484
Google Scholar
Carlin BP, Louis TA (1995) Bayes and empirical Bayes methods for data analysis, 2nd edn. Chapman & Hall/CRC Press, Boca Raton
Google Scholar
Congdon P (2006) Bayesian model choice based on Monte Carlo estimates of posterior model probabilities. Comput Stat Data Anal 50:346–357
Article Google Scholar
Congdon P (2007) Model weights for model choice and averaging. Stat Methodol 4:143–157
Article Google Scholar
Dellaportas P, Forster J, Ntzoufras I (2002) On Bayesian model and variable selection using MCMC. Stat Comput 12:27–36
Article Google Scholar
Draper D (1995) Assessment and propagation of model uncertainty. J Roy Stat Soc Ser B 57:45–97
Google Scholar
Fridley B (2009) Bayesian variable and model selection methods for genetic association studies. Genet Epidemiol 33:27–37
Article PubMed Google Scholar
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–511
Article Google Scholar
Green P (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732
Article Google Scholar
Heath SC (1997) Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet 61:748–760
Article CAS PubMed Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky T (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417
Article Google Scholar
Jannink JL, Wu XL (2003) Estimating allelic number and identity in state of QTLs in interconnected families. Genet Res 81:133–144
Article CAS PubMed Google Scholar
Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546
Article Google Scholar
Miller AJ (1984) Selection of subsets of regression (with discussion). J Roy Stat Soc Ser A 147:387–425
Google Scholar
Munafò MR (2006) Candidate gene studies in the 21st century: meta-analysis, mediation, moderation. Genes Brain Behav 5(Suppl 1):3–8
Article PubMed Google Scholar
Pflieger S, Lefebvre V, Causse M (1996) The candidate gene approach in plant genetics: a review. Mol Breed 7:275–291
Article Google Scholar
Raftery AE (1993) Bayesian model selection in structural equation models. In: Bollen K, Long J (eds) Testing structural equation models. Sage, Newbury Park, pp 163–180
Google Scholar
Raftery AE, Madigan D, Volinsky CT (1996) Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). In: Bernardo J, Berger J, Dawid A, Smith A (eds) Bayesian statistics 5. Oxford University Press, Oxford, pp 323–349
Google Scholar
Regal RR, Hook EB (1991) The effect of model selection on confidential intervals for size of a closed population. Stat Med 10:717–721
Article CAS PubMed Google Scholar
Rothschild MF (2003) Advances in pig genomics and functional gene discovery. Comp Funct Genomics 4:266–270
Article CAS PubMed Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article Google Scholar
Sillanpää MJ, Arjas E (1998) Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148:1373–1388
PubMed Google Scholar
Sillanpää MJ, Arjas E (1999) Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data. Genetics 151:1605–1619
PubMed Google Scholar
Sinharay S, Stein HS (2005) An empirical comparison of methods for computing Bayes factors in generalized linear mixed models. J Comput Graph Stat 14:415–435
Article Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Springer, New York
Google Scholar
Tierney L, Kadane JB (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81:82–86
Article Google Scholar
Uimari P, Hoeschele I (1997) Mapping-linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms. Genetics 146:735–743
CAS PubMed Google Scholar
Wu XL, Jannink JL (2004) Optimal sampling of a population to determine QTL location, variance, and allelic number. Theor Appl Genet 108:1434–1442
Article PubMed Google Scholar
Wu XL, Macneil MD, De S, Xiao QJ, Michal JJ, Gaskins CT, Reeves JJ, Busboom JR, Wright RW Jr, Jiang Z (2005) Evaluation of candidate gene effects for beef backfat via Bayesian model selection. Genetica 125:103–113
Article PubMed Google Scholar
Yi N, Xu S (2000a) Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155:1391–1403
CAS PubMed Google Scholar
Yi N, Xu S (2000b) Bayesian mapping of quantitative trait loci under the identity-by-descent-based variance component model. Genetics 156:411–422
CAS PubMed Google Scholar

Download references

Acknowledgments

This research was supported by the Wisconsin Agriculture Experiment Station, and was partially supported by National Research Initiative Grant no. 2009-35205-05099 from the USDA Cooperative State Research, Education, and Extension Service, NSF DEB-0089742, and NDF DMS-044371. KAW acknowledges financial support from the National Association of Animal Breeders (Columbia, MO). Comments from the anonymous reviewers and the editor are acknowledged.

Author information

Authors and Affiliations

Department of Dairy Science, University of Wisconsin, Madison, WI, 53706, USA
Xiao-Lin Wu, Daniel Gianola, Guilherme J. M. Rosa & Kent A. Weigel
Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
Xiao-Lin Wu & Daniel Gianola
Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, 53706, USA
Daniel Gianola
Department of Animal and Aquacultural Sciences, Norweigian University of Life Sciences, 1432, Ås, Norway
Daniel Gianola

Authors

Xiao-Lin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gianola
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme J. M. Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Kent A. Weigel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao-Lin Wu.

Appendix: Preliminary model selection using the Occam’s Window method

The procedure described here is adapted from the Occam’s Window (OW) method of Madigan and Raftery (1994), which averages over a set of data-supported models. There are two principles underlying OW. First, a model that predicts the data far worse than the model producing the best predictions should no longer be considered. That is, models not belonging to set

$$ A^{\prime} = \left\{ {M_{k} :{\frac{{\max_{l} \left\{ {p\left( {M_{l} |{\mathbf{y}}} \right)} \right\}}}{{p\left( {M_{k} |{\mathbf{y}}} \right)}}} \le c} \right\} $$

(17)

should be excluded, where c is a chosen value (e.g., c = 20, by analogy with the popular 0.05 cutoff for P-values). The second principle (optional) appeals to the Occam’s razor, which excludes complex models (M _k) receiving less support from the data than their simpler counterparts (M _l), i.e., by excluding models in set:

$$ B = \left\{ {M_{k} :M_{l} \in A^{\prime}\;{\text{and}}\;M_{l} \in M_{k} \;{\text{and}}\;{\frac{{p\left( {M_{l} |{\mathbf{y}}} \right)}}{{p\left( {M_{k} |{\mathbf{y}}} \right)}}} > 1} \right\} $$

(18)

Then, Bayesian model averaging (2) is replaced by

$$ p\left( {\Updelta |{\mathbf{y}}} \right) = \sum\limits_{{M_{k} \in A}} {p\left( {\Updelta |M_{k} ,{\mathbf{y}}} \right)p\left( {M_{k} |{\mathbf{y}}} \right)} $$

(19)

where A contains models in set A′ but not in set B, and all probabilities are implicitly conditional on the set of models in A. So, the BMA problem is reduced to finding set A.

Rejection based on OW is based on the interpretation of the ratio of posterior probabilities of two competing models. Let $ r = {\frac{{p\left( {M_{0} |{\mathbf{y}}} \right)}}{{P\left( {M_{1} |{\mathbf{y}}} \right)}}} $, where M ₀ and M ₁ denote the “smaller” and “larger” models, respectively, and $ p\left( {M_{k} |y} \right) $ is the marginal (with respect to the parameters) posterior probability that k is the true model (k = 0, 1). Let O _L and O _R define the left and right boundaries of OW, respectively. If evidence supports M ₀ (i.e., r > O _R), then M ₁ is rejected. Rejecting M ₀ occurs if r < O _L (Optionally, rejecting “smaller” model M ₀ may require stronger evidence if choosing O _L < O ⁻¹_R ). Once M ₀ is rejected, all its “submodels” are rejected as well. If O _L ≤ r ≤ O _R, then both models stay.

Computing model posterior probability (3) requires evaluating integral (4), which may be analytically difficult or even impossible in many practical situations. Laplace’s approximation (Tierney and Kadane 1986) can be used to compute the marginal likelihood, say, of model k:

$$ p\left( {{\mathbf{y}}|M_{k} } \right) \approx \left( {2\pi } \right)^{{p_{k/2} }} \left| {{\mathbf{H}}_{{\tilde{{\varvec{\theta}}}_{k} }}^{ - 1} } \right|^{1/2} p\left( {{\mathbf{y}}|\tilde{{\varvec{\theta}}}_{k} ,M = k} \right)p\left( {\tilde{{\varvec{\theta}}}_{k} |M = k} \right) $$

(20)

where p _k is the dimension of θ _k, $ \tilde{{\varvec{\theta}}}_{k} $ is the posterior mode of θ _k and $ {\mathbf{H}}_{{\tilde{{\varvec{\theta}}}_{k} }}^{ - 1} $ is minus the inverse Hessian of $ h\left( {{\varvec{\theta}}_{k} } \right) = \log \left\{ {p\left( {{\mathbf{y}}|{\varvec{\theta}}_{k} ,M = k} \right)p\left( {{\varvec{\theta}}_{k} |M = k} \right)} \right\} $, evaluated at $ {\varvec{\theta}}_{k} = \tilde{{\varvec{\theta}}}_{k} $. A computationally convenient variant to approximation (20) uses the maximum likelihood estimator $ \hat{{\varvec{\theta}}}_{k} $, instead of the posterior mode $ \tilde{{\varvec{\theta}}}_{k} $ (Sorensen and Gianola 2002). In particular, if observations are i.i.d, then,

$$ p\left( {{\mathbf{y}}|M_{k} } \right) \approx \left( {2\pi } \right)^{{p_{k/2} }} \left( n \right)^{{p_{k/2} }} \left| {{\mathbf{H}}_{{1,\hat{{\varvec{\theta}}}_{k} }}^{ - 1} } \right|^{1/2} p\left( {{\mathbf{y}}|\hat{{\varvec{\theta}}}_{k} ,M = k} \right)p\left( {\hat{{\varvec{\theta}}}_{k} |M = k} \right) $$

(21)

where $ {\mathbf{H}}_{{1,\hat{{\varvec{\theta}}}_{k} }} $ is the observed information matrix calculated from a single observation, evaluated at the maximum likelihood estimates of θ _k. Suppose that the prior conveys some sort of “minimal” information represented by $ {\varvec{\theta}}_{k} |M = k\sim N\left( {\hat{{\varvec{\theta}}}_{k} ,H_{{1,\hat{{\varvec{\theta}}}}}^{ - 1} } \right) $. This is a unit information prior centered at the maximum likelihood estimator and having a precision equivalent to that brought up by a sample of size n = 1. Following Sorensen and Gianola (2002), it can be shown that

$$ \log r = \log \left[ {{\frac{{p\left( {M_{0} |{\mathbf{y}}} \right)}}{{P\left( {M_{1} |{\mathbf{y}}} \right)}}}} \right] = \log \left[ {{\frac{{p\left( {{\mathbf{y}}|\hat{{\varvec{\theta}}}_{0} ,M_{0} } \right)}}{{p\left( {{\mathbf{y}}|\hat{{\varvec{\theta}}}_{1} ,M_{1} } \right)}}}} \right] - {\frac{{\left( {p_{0} - p_{1} } \right)}}{2}}\log n $$

(22)

This is one half of the Bayesian Information Criterion (BIC, Schwarz 1978). Note that evaluating (20) or (21) requires specification of the number of parameters, which is not always obvious with correlated random effects (e.g., animal or sire effects) in the model. However, as shown in (22), what matters is the difference in number of parameters between the two models. In a CG study, all models share the same set of infinitesimal additive effects, and computing (22) is straightforward. Nevertheless, (22) is not readily applicable to situations when the number of random effects varies among models. A review of approaches for computing marginal likelihoods for selection of models involving random effects is in Sinharay and Stein (2005).

Model selection search can proceed in two directions, e.g., moving either from larger to smaller models (“down” algorithm) or from smaller to larger models (“up” algorithm). Let A and C be dynamically changing subsets of model space M, which contain “acceptable” models and “candidate” models (i.e., those currently under consideration), respectively. Both algorithms start with A = Ø and C = {set of starting models} and proceed until set C is empty. Upon completion, set A contains a set of potentially acceptable models. Finally, all models meeting either of the following two criteria are removed:

$$ \left\{ {M_{k} :M_{l} \in A^{\prime}\;{\text{and}}\;M_{l} \in M_{k} \;{\text{and}}\;\log \left( {{\frac{{p\left( {M_{l} |{\mathbf{y}}} \right)}}{{p\left( {M_{k} |{\mathbf{y}}} \right)}}}} \right) > O_{R} } \right\} $$

(23)

$$ \left\{ {M_{k} :\log \left( {{\frac{{\max_{l} \left\{ {p\left( {M_{l} |{\mathbf{y}}} \right)} \right\}}}{{p\left( {M_{k} |{\mathbf{y}}} \right)}}}} \right) > \log (c)} \right\} $$

(24)

This reduces considerably the number of models, and retains a set of data-supported models for the BMA analysis.

To illustrate the process, consider model selection involving the choice of three regression variables x ₁, x ₂, and x ₃, which may represent three candidate gene effects. Let sample size be n = 1,200, and x ₂ be the only regression variable that truly affects observation y. Without considering their interactions, there are eight possible models, including the null model, as follows:

$$ \begin{array}{*{20}c} {M1:y = \mu + x_{1} + x_{2} + x_{3} + e;} \hfill & {M2:y = \mu + x_{2} + x_{3} + e;} \hfill \\ {M3:y = \mu + x_{1} + x_{3} + e;} \hfill & {M4:y = \mu + x_{1} + x_{2} + e;} \hfill \\ {M5:y = \mu + x_{1} + e;} \hfill & {M6:y = \mu + x_{2} + e;} \hfill \\ {M7:y = \mu + x_{3} + e;} \hfill & {M8:y = \mu + e.} \hfill \\ \end{array} $$

Let log (O _L) = log (1/20) ≈ −1.30 and log (O _R) = log (20) ≈ 1.30 define the left and right boundaries of the OW, respectively.

Suppose that the “down” algorithm initializes with A = Ø and for simplicity C = {M1}. To start, pick a model, i.e., M1, from set C and add it to set A. Model selection search proceeds by comparing M1 with one of its submodels formed by removing one regression variable each time (leading to M2, M3, and M4, respectively). If M1 is rejected, the submodel replaces M1; otherwise, it “survives” as an acceptable model. First, select a submodel of M1, say M2 (which is M1 without x ₁). Because the influence of x ₁ on y is immaterial, models M1 and M2 would be expected to give similar likelihoods but the latter model has one less parameter. Using (22), this leads to $ \log \left( {{\frac{{p\left( {M2|y} \right)}}{{p\left( {M1|y} \right)}}}} \right) \approx 0 - {\frac{{\left( { - 1} \right)}}{2}}\log \left( {120} \right) \approx 1.54 > \log \left( {O_{R} } \right). $ Thus, model selection rejects M1, and M2 replaces M1 in set C. Continue the same process with the other two submodels M3 and M4. Assume that M3 is rejected and M4 is accepted, because the latter contains the influential variable x ₂. Then, set C now consists of two models, M2 and M4.

Next, M2 is compared with all its submodels with one less regression variable (i.e., M6 and M7). First, select M7, which is M2 without x ₂. Assume that $ \log \left[ {{\frac{{p\left( {{\mathbf{y}}|\hat{{\varvec{\theta}}}_{7} ,M7} \right)}}{{p\left( {{\mathbf{y}}|\hat{{\varvec{\theta}}}_{2} ,M2} \right)}}}} \right] = - 3.2 $. Then, $ \log \left[ {{\frac{{p\left( {M7|{\mathbf{y}}} \right)}}{{P\left( {M2|{\mathbf{y}}} \right)}}}} \right] \approx - 3.20 + 1.54 = - 1.66 < \log (O_{L} ) $. Thus, M7 is rejected, and so is its submodel M8. Model selection continues to compare it with the other submodel M6 (i.e., formed by removing x ₃ from M2). Because x ₃ has no significant effect on y, we would expect that the likelihoods of the two models are approximately equal, such that $ \log \left( {{\frac{{p\left( {M6|y} \right)}}{{p\left( {M2|y} \right)}}}} \right) \approx 0 - {\frac{{\left( { - 1} \right)}}{2}}\log \left( {120} \right) \approx 1.54 > \log \left( {O_{R} } \right) $. Thus, model selection rejects M2, and M6 replaces M2 in set C (now, set C consists of M4 and M6).

Like with M2, repeat the same calculation for comparing M4 and all its submodels (i.e., M5 and M6). For simplicity, we assume that M5 is rejected and M6 is accepted, because the latter contains the influential variable x ₃. Because M6 is already in set C, there is no need to add it there.

By the same reasoning, the model selection retains M6 when compared to the null model M8, so M6 enters set A. Now, there is no more “candidate” model in set C, and the “down” algorithm ends. The outcome is set A which contains only one acceptable model. In real situations, however, more models are expected in set A.

The “up” algorithm starts, e.g., with A = Ø and, for example, C = {M6}, where set C contains model(s) output from the “down” algorithm. To proceed, the model selection compares M6 with one of its super-models each with one more regression variable. If there is decisive evidence for the super-model, it replaces the smaller model, otherwise the smaller model remains. Adding x ₁ and x ₃ to M6 leads to super-models M4 and M2, respectively. In this setting, model selection rejects both super-models and retains the smaller model (M6), because both variables do not influence y. Upon completion, set A contains the acceptable model, i.e., M6. Because there is only one model in set A, evaluations based on (23) and (24) are no longer necessary, and set A is final.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, XL., Gianola, D., Rosa, G.J.M. et al. Bayesian model averaging for evaluation of candidate gene effects. Genetica 138, 395–407 (2010). https://doi.org/10.1007/s10709-009-9433-4

Download citation

Received: 27 July 2009
Accepted: 16 December 2009
Published: 05 January 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10709-009-9433-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian model averaging for evaluation of candidate gene effects

Abstract

Access this article

Similar content being viewed by others

Bayesian Methods Applied to GWAS

Hierarchical inference for genome-wide association studies: a view on methodology with software

Application of Bayesian genomic prediction methods to genome-wide association analyses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Preliminary model selection using the Occam’s Window method

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian model averaging for evaluation of candidate gene effects

Abstract

Access this article

Similar content being viewed by others

Bayesian Methods Applied to GWAS

Hierarchical inference for genome-wide association studies: a view on methodology with software

Application of Bayesian genomic prediction methods to genome-wide association analyses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Preliminary model selection using the Occam’s Window method

Appendix: Preliminary model selection using the Occam’s Window method

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation