Skip to main content
Log in

Genomic prediction of dichotomous traits with Bayesian logistic models

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Bayesian methods are a popular choice for genomic prediction of genotypic values. The methodology is well established for traits with approximately Gaussian phenotypic distribution. However, numerous important traits are of dichotomous nature and the phenotypic counts observed follow a Binomial distribution. The standard Gaussian generalized linear models (GLM) are not statistically valid for this type of data. Therefore, we implemented Binomial GLM with logit link function for the BayesB and Bayesian GBLUP genomic prediction methods. We compared these models with their standard Gaussian counterparts using two experimental data sets from plant breeding, one on female fertility in wheat and one on haploid induction in maize, as well as a simulated data set. With the aid of the simulated data referring to a bi-parental population of doubled haploid lines, we further investigated the influence of training set size (N), number of independent Bernoulli trials for trait evaluation (n i ) and genetic architecture of the trait on genomic prediction accuracies and abilities in general and on the relative performance of our models. For BayesB, we in addition implemented finite mixture Binomial GLM to account for overdispersion. We found that prediction accuracies increased with increasing N and n i . For the simulated and experimental data sets, we found Binomial GLM to be superior to Gaussian models for small n i , but that for large n i Gaussian models might be used as ad hoc approximations. We further show with simulated and real data sets that accounting for overdispersion in Binomial data can markedly increase the prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barret P, Brinkmann M, Beckert M (2008) A major locus expressed in the male gametophyte with incomplete penetrance is responsible for in situ gynogenesis in maize. Theor Appl Genet 117:581–94

    Article  PubMed  CAS  Google Scholar 

  • de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2012) Whole genome regression and prediction methods applied to plant and animal breeding. Genetics. doi:10.1534/genetics.112.143313

  • Che X, Xu S (2012) Generalized linear mixed models for mapping multiple quantitative trait loci. Heredity 109:41–49

    Article  PubMed  CAS  Google Scholar 

  • Clark S, Hickey JM, van der Werf JH (2011) Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 43:18

    Google Scholar 

  • Dey D, Gelfand A, Peng F (1997) Overdispersed generalized linear models. J Stat Plan Infer 64:93–107

    Article  Google Scholar 

  • Dou B, Hou B, Xu H, Lou X, Chi X, Yang J, Wang F, Ni Z, Sun Q (2009) Efficient mapping of a female sterile gene in wheat (Triticum aestivum L.). Genetics res 91:337–43

    Google Scholar 

  • Dou B, Hou B, Wang F, Yang J, Ni Z, Sun Q, Zhang YM (2010) Further mapping of quantitative trait loci for female sterility in wheat (Triticum aestivum L.). Genetics res 92:63–70

    Google Scholar 

  • Falconer DS, Mackay TFC (1996) Introduction to quantitative genetics, 4th edn. Longmans Green, Harlow

  • Frühwirth-Schnatter S (2006) Finite mixture and Markov switching models. Springer series in statistics. Springer, New York

  • Frühwirth-Schnatter S, Frühwirth R, Held L, Rue Hv (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19:479–492

    Article  Google Scholar 

  • Fussl A, Frühwirth-Schnatter S, Frühwirth R (2012) Efficient mcmc for binomial logit models. ACM T Model Comput S (special issue on Monte Carlo methods in statistics forthcoming)

  • Gelfand AE, Sahu SK (1999) Identifiability, improper priors and gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253

    Article  Google Scholar 

  • Goggi A, Pollak L, Golden J (2007) Impact of early seed quality selection on maize inbreds and hybrids. Maydica 52:223–233

    Google Scholar 

  • Hayes BJ, Pryce J, Chamberlain AJ, Bowman PJ, Goddard M (2010) Genetic architecture of complex traits and accuracy of genomic prediction: coat colour, milk-fat percentage, and type in Holstein cattle as contrasting model traits. PLoS Genet 6:e1001, 139

    Google Scholar 

  • Kärkkäinen HP, Sillanpää MJ (2012) Back to basics for bayesian model building in genomic selection. Genetics 191:969–987

    Article  PubMed  Google Scholar 

  • Kleiber D, Prigge V, Melchinger AE, Burkard F, San Vicente F, Palomino G, Gordillo GA (2012) Haploid fertility in temperate and tropical maize germplasm. Crop Sci 52:623–630

    Article  Google Scholar 

  • Lashermes P, Beckert M, Crouelle DD (1988) Genetic control of maternal haploidy in maize (Zea mays L.) and selection of haploid inducing lines. Theor Appl Genet 76:405–410

    Google Scholar 

  • Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88:294–305

    Article  PubMed  Google Scholar 

  • Li L, Xu X, Jin W, Chen S (2009) Morphological and molecular evidences for DNA introgression in haploid induction via a high oil inducer CAUHOI in maize. Planta 230:367–376

    Article  PubMed  CAS  Google Scholar 

  • Meng X (1997) The EM algorithm and medical studies: a historical linik. Stat Methods Med Res 6:3–23

    Article  PubMed  CAS  Google Scholar 

  • Meuwissen TH, Hayes BJ, Goddard M (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    PubMed  CAS  Google Scholar 

  • Plummer M (2003) JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling

  • Plummer M, Best N, Cowles K, Vines K (2010) coda: output analysis and diagnostics for MCMC. http://CRAN.R-project.org/package=coda,rpackageversion0.14-2

  • Prigge V, Xu X, Li L, Babu R, Chen S, Atlin GN, Melchinger AE (2012) New insights into the genetics of in vivo induction of maternal haploids, the backbone of doubled haploid technology in maize. Genetics 190:781–793

    Article  PubMed  CAS  Google Scholar 

  • R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/. ISBN: 3-900051-07-0

  • Sellamuthu R, Liu GF, Ranganathan CB, Serraj R (2011) Genetic analysis and validation of quantitative trait loci associated with reproductive-growth traits and grain yield under drought stress in a doubled haploid line population of rice (Oryza sativa L.). Field Crops Res 124:46–58

    Google Scholar 

  • Technow F (2011) hypred: simulation of genomic data in applied genetics. R package version 0.1

  • Technow F, Riedelsheimer C, Schrag Ta, Melchinger AE (2012) Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects. Theor Appl Genet 125:1181–1194

    Article  PubMed  Google Scholar 

  • Thomas A, OHara R, U L, Sturtz S (2006) Making bugs open. R News 6:12–17

    Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J dairy Sci 91:4414–4423

    Article  PubMed  CAS  Google Scholar 

  • Villumsen TM, Janss L, Lund MS (2009) The importance of haplotype length and heritability using genomic selection in dairy cattle. J Anim Breed Genetics 126:3–13

    Article  CAS  Google Scholar 

  • Wray NR, Goddard ME, Visscher PM (2008) Prediction of individual genetic risk of complex disease. Curr Opin Genet Dev 18:257–263

    Article  PubMed  CAS  Google Scholar 

  • Yang W, Tempelman RJ (2012) A Bayesian antedependence model for whole genome prediction. Genetics 190:1491–1501

    Article  PubMed  Google Scholar 

  • Yousefabadi V, Rajabi A (2012) Study on inheritance of seed technological characteristics in sugar beet. Euphytica 186:367–376

    Article  Google Scholar 

  • Zhao Z, Wang C, Jiang L, Zhu S, Ikehashi H, Wan J (2006) Identification of a new hybrid sterility gene in rice (bi Oryza sativa L.). Euphytica 151:331–337

    Google Scholar 

  • Zhong S, Dekkers JCM, Fernando RL, Jannink JL (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a Barley case study. Genetics 182:355–364

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This research was funded by the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr Synbreed—Synergistic plant and animal breeding (FKZ: 0315528d).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank Technow.

Additional information

Communicated by M. Sillanpää.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Technow, F., Melchinger, A.E. Genomic prediction of dichotomous traits with Bayesian logistic models. Theor Appl Genet 126, 1133–1143 (2013). https://doi.org/10.1007/s00122-013-2041-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-013-2041-9

Keywords

Navigation