Abstract
Bayesian methods are a popular choice for genomic prediction of genotypic values. The methodology is well established for traits with approximately Gaussian phenotypic distribution. However, numerous important traits are of dichotomous nature and the phenotypic counts observed follow a Binomial distribution. The standard Gaussian generalized linear models (GLM) are not statistically valid for this type of data. Therefore, we implemented Binomial GLM with logit link function for the BayesB and Bayesian GBLUP genomic prediction methods. We compared these models with their standard Gaussian counterparts using two experimental data sets from plant breeding, one on female fertility in wheat and one on haploid induction in maize, as well as a simulated data set. With the aid of the simulated data referring to a bi-parental population of doubled haploid lines, we further investigated the influence of training set size (N), number of independent Bernoulli trials for trait evaluation (n i ) and genetic architecture of the trait on genomic prediction accuracies and abilities in general and on the relative performance of our models. For BayesB, we in addition implemented finite mixture Binomial GLM to account for overdispersion. We found that prediction accuracies increased with increasing N and n i . For the simulated and experimental data sets, we found Binomial GLM to be superior to Gaussian models for small n i , but that for large n i Gaussian models might be used as ad hoc approximations. We further show with simulated and real data sets that accounting for overdispersion in Binomial data can markedly increase the prediction accuracy.
Similar content being viewed by others
References
Barret P, Brinkmann M, Beckert M (2008) A major locus expressed in the male gametophyte with incomplete penetrance is responsible for in situ gynogenesis in maize. Theor Appl Genet 117:581–94
de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2012) Whole genome regression and prediction methods applied to plant and animal breeding. Genetics. doi:10.1534/genetics.112.143313
Che X, Xu S (2012) Generalized linear mixed models for mapping multiple quantitative trait loci. Heredity 109:41–49
Clark S, Hickey JM, van der Werf JH (2011) Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 43:18
Dey D, Gelfand A, Peng F (1997) Overdispersed generalized linear models. J Stat Plan Infer 64:93–107
Dou B, Hou B, Xu H, Lou X, Chi X, Yang J, Wang F, Ni Z, Sun Q (2009) Efficient mapping of a female sterile gene in wheat (Triticum aestivum L.). Genetics res 91:337–43
Dou B, Hou B, Wang F, Yang J, Ni Z, Sun Q, Zhang YM (2010) Further mapping of quantitative trait loci for female sterility in wheat (Triticum aestivum L.). Genetics res 92:63–70
Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Longmans Green, Harlow
Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer series in statistics. Springer, New York
Frühwirth-Schnatter S, Frühwirth R, Held L, Rue Hv (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19:479–492
Fussl A, Frühwirth-Schnatter S, Frühwirth R (2012) Efficient mcmc for binomial logit models. ACM T Model Comput S (special issue on Monte Carlo methods in statistics forthcoming)
Gelfand AE, Sahu SK (1999) Identifiability, improper priors and gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253
Goggi A, Pollak L, Golden J (2007) Impact of early seed quality selection on maize inbreds and hybrids. Maydica 52:223–233
Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard M (2010) Genetic architecture of complex traits and accuracy of genomic prediction: coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet 6:e1001, 139
Kärkkäinen HP, Sillanpää MJ (2012) Back to basics for bayesian model building in genomic selection. Genetics 191:969–987
Kleiber D, Prigge V, Melchinger AE, Burkard F, San Vicente F, Palomino G, Gordillo GA (2012) Haploid fertility in temperate and tropical maize germplasm. Crop Sci 52:623–630
Lashermes P, Beckert M, Crouelle DD (1988) Genetic control of maternal haploidy in maize (Zea mays L.) and selection of haploid inducing lines. Theor Appl Genet 76:405–410
Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88:294–305
Li L, Xu X, Jin W, Chen S (2009) Morphological and molecular evidences for DNA introgression in haploid induction via a high oil inducer CAUHOI in maize. Planta 230:367–376
Meng X (1997) The EM algorithm and medical studies: a historical linik. Stat Methods Med Res 6:3–23
Meuwissen TH, Hayes BJ, Goddard M (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Plummer M (2003) JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling
Plummer M, Best N, Cowles K, Vines K (2010) coda: output analysis and diagnostics for MCMC. http://CRAN.R-project.org/package=coda,rpackageversion0.14-2
Prigge V, Xu X, Li L, Babu R, Chen S, Atlin GN, Melchinger AE (2012) New insights into the genetics of in vivo induction of maternal haploids, the backbone of doubled haploid technology in maize. Genetics 190:781–793
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN: 3-900051-07-0
Sellamuthu R, Liu GF, Ranganathan CB, Serraj R (2011) Genetic analysis and validation of quantitative trait loci associated with reproductive-growth traits and grain yield under drought stress in a doubled haploid line population of rice (Oryza sativa L.). Field Crops Res 124:46–58
Technow F (2011) hypred: simulation of genomic data in applied genetics. R package version 0.1
Technow F, Riedelsheimer C, Schrag Ta, Melchinger AE (2012) Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects. Theor Appl Genet 125:1181–1194
Thomas A, OHara R, U L, Sturtz S (2006) Making bugs open. R News 6:12–17
VanRaden PM (2008) Efficient methods to compute genomic predictions. J dairy Sci 91:4414–4423
Villumsen TM, Janss L, Lund MS (2009) The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genetics 126:3–13
Wray NR, Goddard ME, Visscher PM (2008) Prediction of individual genetic risk of complex disease. Curr Opin Genet Dev 18:257–263
Yang W, Tempelman RJ (2012) A Bayesian antedependence model for whole genome prediction. Genetics 190:1491–1501
Yousefabadi V, Rajabi A (2012) Study on inheritance of seed technological characteristics in sugar beet. Euphytica 186:367–376
Zhao Z, Wang C, Jiang L, Zhu S, Ikehashi H, Wan J (2006) Identification of a new hybrid sterility gene in rice (bi Oryza sativa L.). Euphytica 151:331–337
Zhong S, Dekkers JCM, Fernando RL, Jannink JL (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a Barley case study. Genetics 182:355–364
Acknowledgements
This research was funded by the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr Synbreed—Synergistic plant and animal breeding (FKZ: 0315528d).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Sillanpää.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Technow, F., Melchinger, A.E. Genomic prediction of dichotomous traits with Bayesian logistic models. Theor Appl Genet 126, 1133–1143 (2013). https://doi.org/10.1007/s00122-013-2041-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-013-2041-9