Advertisement

Theoretical and Applied Genetics

, Volume 125, Issue 7, pp 1575–1587 | Cite as

Swift block-updating EM and pseudo-EM procedures for Bayesian shrinkage analysis of quantitative trait loci

  • Crispin M. Mutshinda
  • Mikko J. Sillanpää
Original Paper

Abstract

Introduction

Virtually all existing expectation-maximization (EM) algorithms for quantitative trait locus (QTL) mapping overlook the covariance structure of genetic effects, even though this information can help enhance the robustness of model-based inferences.

Results

Here, we propose fast EM and pseudo-EM-based procedures for Bayesian shrinkage analysis of QTLs, designed to accommodate the posterior covariance structure of genetic effects through a block-updating scheme. That is, updating all genetic effects simultaneously through many cycles of iterations.

Conclusion

Simulation results based on computer-generated and real-world marker data demonstrated the ability of our method to swiftly produce sensible results regarding the phenotype-to-genotype association. Our new method provides a robust and remarkably fast alternative to full Bayesian estimation in high-dimensional models where the computational burden associated with Markov chain Monte Carlo simulation is often unwieldy. The R code used to fit the model to the data is provided in the online supplementary material.

Keywords

Quantitative Trait Locus Genetic Effect Quantitative Trait Locus Mapping Genomic Breeding Value Estimation Posterior Covariance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

The authors wish to thank Hanni Kärkkäinen, Zitong Li and two anonymous referees for their pertinent comments and suggestions. This work was supported by a research grants from the Academy of Finland and University of Helsinki’s research funds.

Supplementary material

122_2012_1936_MOESM1_ESM.pdf (177 kb)
Supplementary material 1 (PDF 177 kb)
122_2012_1936_MOESM2_ESM.pdf (18 kb)
Supplementary material 2 (PDF 17 kb)

References

  1. Ball RD (2001) Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using Bayesian information criterion. Genetics 159:1351–1364PubMedGoogle Scholar
  2. Bishop CM, Tipping ME (2003) Bayesian regression and classification. In: Suykens J, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, models and applications, vol 190. IOS Press, NATO Science, Amsterdam, pp 267–285Google Scholar
  3. Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J Roy Stat Soc B 64:641–656CrossRefGoogle Scholar
  4. Broman KW (2001) Review of statistical methods for QTL mapping in experimental crosses. Lab Anim 30:44–52Google Scholar
  5. Cai X, Huang A, Xu S (2011) Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinform 12:211CrossRefGoogle Scholar
  6. Carbonell EA, Asins MJ, Baselga M, Balansard E, Gerig TM (1993) Power studies in the estimation of genetic parameters and the localization of quantitative trait loci for backcross and doubled haploid populations. Theor Appl Genet 86:411–416CrossRefGoogle Scholar
  7. Carlborg Ö, Andersson L (2002) Use of randomization testing to detect multiple epistatic QTLs. Genet Sel Evol 79:175–184Google Scholar
  8. Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971PubMedGoogle Scholar
  9. Cleveland MA, Forni S, Nader D, Maltecca C (2010) Genomic breeding value prediction using three Bayesian methods and application to reduced density marker panels. BMC Proc 4(Suppl 1):S6PubMedCrossRefGoogle Scholar
  10. Conti DV, Witte J (2003) Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations. Am J Hum Genet 72:351–363PubMedCrossRefGoogle Scholar
  11. de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385CrossRefGoogle Scholar
  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38Google Scholar
  13. Fridley BL, Jenkins GD (2010) Localizing putative markers in genetic association studies by incorporating linkage disequilibrium into Bayesian hierarchical models. Hum Hered 70:63–73PubMedCrossRefGoogle Scholar
  14. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New YorkGoogle Scholar
  15. Gelman A, Carlin JB, Stern HS, Rubin DB (2003) Bayesian data analysis, 2nd edn. Chapman and Hall, New YorkGoogle Scholar
  16. George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889CrossRefGoogle Scholar
  17. Gilks WR, Richardson S, Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo in practice. Chapman and Hall, LondonGoogle Scholar
  18. Gimelfarb A, Lande R (1994a) Simulation of marker-assisted selection in hybrid populations. Genet Res 63:39–47PubMedCrossRefGoogle Scholar
  19. Gimelfarb A, Lande R (1994b) Simulation of marker-assisted selection for non-additive traits. Genet Res 64:127–136PubMedCrossRefGoogle Scholar
  20. Golub G, van Loan C (1996) Matrix computations, 3rd edn. The John Hopkins University Press, BaltimoreGoogle Scholar
  21. Hayashi T, Iwata H (2010) EM algorithm for Bayesian estimation of genomic breeding values. BMC Genet 11:3PubMedCrossRefGoogle Scholar
  22. Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C (2000) Dependency network for inference, collaborative filtering, and data visualization. J Mach Learn Res 1:49–75Google Scholar
  23. Henderson CR (1950) Estimation of genetic parameters. Ann Math Stat 21:309–310Google Scholar
  24. Henderson CR (1970) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447CrossRefGoogle Scholar
  25. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67CrossRefGoogle Scholar
  26. Hoti F, Sillanpää MJ (2006) Bayesian mapping of genotype × expression interactions in quantitative and qualitative traits. Heredity 97:4–18PubMedCrossRefGoogle Scholar
  27. Jackson CH, Best NG, Richardson S (2009) Bayesian graphical models for regression on multiple data sets with different variables. Biostatistics 10:335–351PubMedCrossRefGoogle Scholar
  28. Jeffreys H (1961) Theory of probability. Clarendon Press, OxfordGoogle Scholar
  29. Kabán A (2007) On Bayesian classification with Laplace priors. Patt Rec Lett 28:1271–1282CrossRefGoogle Scholar
  30. Kao C-H, Zeng Z-B, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203–1216PubMedGoogle Scholar
  31. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795CrossRefGoogle Scholar
  32. Knürr T, Läärä E, Sillanpää MJ (2011) Genetic analysis of complex traits via Bayesian variable selection: the utility of a mixture of uniform priors. Genet Res 93:303–318CrossRefGoogle Scholar
  33. Kärkkäinen HP, Sillanpää MJ (2012) Back to basics for Bayesian model building in genomic selection. Genetics 191:969–987Google Scholar
  34. Lande R, Thompson R (1990) Efficiency of marker assisted selection in the improvement of quantitative traits. Genetics 124:743–756PubMedGoogle Scholar
  35. Li Y, Campbell C, Tipping ME (2002) Bayesian automatic relevance determination algorithms for classifying gene expression data. Bioinformatics 18:1332–1339PubMedCrossRefGoogle Scholar
  36. Li Z, Sillanpää MJ (2012a) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190:231–249PubMedCrossRefGoogle Scholar
  37. Li Z, Sillanpää MJ (2012b) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125:419–435Google Scholar
  38. Lowd D, Shamaei A (2011) Mean field inference in dependency networks: an empirical study. In: Proceedings of the 25th conference on artificial intelligence (AAAI-11), San Francisco, CAGoogle Scholar
  39. Lunn D, Best N, Spiegelhalter D, Graham G, Neuenschwander B (2009) Combining MCMC with ‘sequential’ PKPD modelling. J Pharmacokinet Pharmacodyn 36:19–38PubMedCrossRefGoogle Scholar
  40. Makhuvha T, Pegram G, Sparks R, Zucchini W (1997) Patching rainfall data using regression methods. 1. Best subset selection, EM and pseudo-EM methods: theory. J Hydrol 198:289–307CrossRefGoogle Scholar
  41. Malo N, Libiger O, Schork NJ (2008) Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet 82:375–385PubMedCrossRefGoogle Scholar
  42. McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New YorkGoogle Scholar
  43. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedGoogle Scholar
  44. Mutshinda CM, O’Hara RB, Woiwod IP (2011) A multispecies perspective on ecological impacts of climatic forcing. J Anim Ecol 80:101–107PubMedCrossRefGoogle Scholar
  45. Mutshinda CM, Sillanpää MJ (2011) Bayesian shrinkage analysis of QTLs under shape-adaptive shrinkage priors, and accurate re-estimation of genetic effects. Heredity 107:405–412PubMedCrossRefGoogle Scholar
  46. Mutshinda CM, Sillanpää MJ (2010) Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction. Genetics 186:1067–1075PubMedCrossRefGoogle Scholar
  47. Mutshinda CM, O’Hara RB, Woiwod IP (2009) What drives community dynamics? Proc R Soc B 276:2923–2929PubMedCrossRefGoogle Scholar
  48. Miller A (2002) Subset selection in regression. Chapman and Hall, LondonCrossRefGoogle Scholar
  49. Myers RL (1992) Classical and modern regression analysis, 2nd edn. Wiley, New-YorkGoogle Scholar
  50. O’Hara RB, Sillanpää MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4:85–118CrossRefGoogle Scholar
  51. R Development Core Team (2011) R: A language and environment for statistical computing, reference index version 2.13.2. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
  52. Sen S, Churchill GA (2001) A statistical framework for quantitative trait mapping. Genetics 159:371–387PubMedGoogle Scholar
  53. Shepherd R, Meuwissen THE, Woolliams JA (2010) Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers. BMC Bioinform 11:529CrossRefGoogle Scholar
  54. Sillanpää MJ, Hoti F (2007) Mapping quantitative trait loci from a single tail sample of the phenotype distribution including survival data. Genetics 177:2361–2377PubMedCrossRefGoogle Scholar
  55. Sillanpää MJ, Bhattacharjee M (2006) Association mapping of complex trait loci with context-dependent effects and unknown context-variable. Genetics 174:1597–1611PubMedCrossRefGoogle Scholar
  56. Sillanpää MJ, Bhattacharjee M (2005) Bayesian association-based fine mapping in small chromosomal segments. Genetics 169:427–439PubMedCrossRefGoogle Scholar
  57. Sillanpää MJ, Corander J (2002) Model choice in gene mapping: what and why. Trends Genet 18:301–307PubMedCrossRefGoogle Scholar
  58. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989PubMedCrossRefGoogle Scholar
  59. Sun W, Ibrahim JG, Zou F (2010) Genome-wide multiple loci mapping in experimental crosses by the iterative penalized regression. Genetics 185:349–359PubMedCrossRefGoogle Scholar
  60. ter Braak CJF, Boer MP, Bink MCAM (2005) Extending Xu’s Bayesian model for estimating polygenic effects using markers of the entire genome. Genetics 170:1435–1438PubMedCrossRefGoogle Scholar
  61. Tibshirani R (1996) Regression shrinkage and selection via LASSO. J Roy Stat Soc B 58:267–288Google Scholar
  62. Tinker NA, Mather DE, Rosnagel BG, Kasha KJ, Kleinhofs A (1996) Regions of the genome that affect agronomic performance in two-row barley. Crop Sci 36:1053–1062CrossRefGoogle Scholar
  63. Tipping ME, Lawrence ND (2005) Variational inference for Student-t models: robust Bayesian interpolation and generalized component analysis. NeuroComputing 69:123–141CrossRefGoogle Scholar
  64. Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244Google Scholar
  65. Wang S, Basten CJ, Zeng Z-B (2006) Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NCGoogle Scholar
  66. Wang H, Zhang Y-M, Li X, Masinde GL, Mohan S, Baylink DJ, Xu S (2005) Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170:465–480PubMedCrossRefGoogle Scholar
  67. Whittaker JC, Thompson R, Denham MC (2000) Marker-assisted selection using ridge regression. Genet Res 75:249–252PubMedCrossRefGoogle Scholar
  68. Xu S (2010) An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity 105:483–494PubMedCrossRefGoogle Scholar
  69. Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63:513–521PubMedCrossRefGoogle Scholar
  70. Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801PubMedGoogle Scholar
  71. Xu S, Jia Z (2007) Genomewide analysis of epistatic effects for quantitative traits in barley. Genetics 175:1955–1963PubMedCrossRefGoogle Scholar
  72. Yi N, Banerjee S (2009) Hierarchical generalized linear models for multiple quantitative trait locus mapping. Genetics 181:1101–1113PubMedCrossRefGoogle Scholar
  73. Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055PubMedCrossRefGoogle Scholar
  74. Yi N, George V, Allison DB (2003) Stochastic search variable selection for identifying multiple quantitative trait loci. Genetics 164:1129–1138PubMedGoogle Scholar
  75. Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS (2007) An efficient Bayes model selection approach for interacting quantitative trait loci models with many effects. Genetics 176:1865–1877PubMedCrossRefGoogle Scholar
  76. Zielke G (1968) Inversion of modified symmetric matrices. J Assoc Comput Mach 15:402–408CrossRefGoogle Scholar
  77. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67:301–320CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Crispin M. Mutshinda
    • 1
    • 5
  • Mikko J. Sillanpää
    • 1
    • 2
    • 3
    • 4
  1. 1.Department of Mathematics and StatisticsUniversity of HelsinkiHelsinkiFinland
  2. 2.Department of Agricultural SciencesUniversity of HelsinkiHelsinkiFinland
  3. 3.Department of Mathematical SciencesUniversity of OuluOuluFinland
  4. 4.Department of BiologyUniversity of OuluOuluFinland
  5. 5.Department of Mathematics and Computer ScienceMount Allison UniversitySackvilleCanada

Personalised recommendations