Theoretical and Applied Genetics

, Volume 131, Issue 1, pp 93–105 | Cite as

Bayesian optimization for genomic selection: a method for discovering the best genotype among a large number of candidates

  • Ryokei Tanaka
  • Hiroyoshi IwataEmail author
Original Article


Key message

A new pre-breeding strategy based on an optimization algorithm is proposed and evaluated via simulations. This strategy can find superior genotypes with less phenotyping effort.


Genomic prediction is a promising approach to search for superior genotypes among a large number of accessions in germplasm collections preserved in gene banks. When some accessions are phenotyped and genotyped, a prediction model can be built, and the genotypic values of the remaining accessions can be predicted from their marker genotypes. In this study, we focused on the application of genomic prediction to pre-breeding, and propose a novel strategy that would reduce the cost of phenotyping needed to discover better accessions. We regarded the exploration of superior genotypes with genomic prediction as an optimization problem, and introduced Bayesian optimization to solve it. Bayesian optimization, that samples unobserved inputs according to the expected improvement (EI) as a selection criterion, seemed to be beneficial in pre-breeding. The EI depends on the predicted distribution of genotypic values, whereas usual selection depends only on the point estimate. We simulated a search for the best genotype among candidate genotypes and showed that the EI-based strategy required fewer genotypes to identify the best genotype than the usual and random selection strategy. Therefore, Bayesian optimization can be useful for applying genomic prediction to pre-breeding and would reduce the number of phenotyped accessions needed to find the best accession among a large number of candidates.



This work was supported by JSPS KAKENHI Grant Number 16H04858.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

122_2017_2988_MOESM1_ESM.pdf (2.7 mb)
Supplementary material 1 (PDF 2746 kb)


  1. Akdemir D, Sanchez JI (2016) Efficient breeding by genomic mating. Front Genet 7:210CrossRefPubMedPubMedCentralGoogle Scholar
  2. Asoro FG, Newell MA, Beavis WD, Scott MP, Jannink JL (2011) Accuracy and training population design for genomic selection on quantitative traits in elite North American oats. Plant Genome 4:132–144CrossRefGoogle Scholar
  3. Auer P, Cesa-Bianchi N, Fisher P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47:235–256CrossRefGoogle Scholar
  4. Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefGoogle Scholar
  5. Brown AHD (1989) Core collections: a practical approach to genetic resources management. Genome 31:818–824CrossRefGoogle Scholar
  6. Chang HX, Brown PJ, Lipka AE, Domier LL, Hartman GL (2016) Genome-wide association and genomic prediction identifies associated loci and predicts the sensitivity of Tobacco ringspot virus in soybean plant introductions. BMC Genom 17:153CrossRefGoogle Scholar
  7. Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Adv Neural Inf Process Syst, IN, pp 2249–2257Google Scholar
  8. Crossa J, de los Campos G, Perez P, Gianola D, Burgueno J, Araus JL et al (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724CrossRefPubMedPubMedCentralGoogle Scholar
  9. Daetwyler HD, Villanueva B, Woolliams JA (2008) Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS One 3:e3395CrossRefPubMedPubMedCentralGoogle Scholar
  10. de los Campos G, Gianola D, Rosa GJM (2009) Reproducing kernel Hilbert spaces regression: a general framework for genetic evaluation. J Anim Sci 87:1883–1887CrossRefPubMedGoogle Scholar
  11. de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345CrossRefPubMedPubMedCentralGoogle Scholar
  12. Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173:1761–1776CrossRefPubMedPubMedCentralGoogle Scholar
  13. Gorjanc G, Jenko J, Hearne SJ, Hickey JM (2016) Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genom 17:30CrossRefGoogle Scholar
  14. Heffner EL, Jannink JL, Sorrells ME (2011) Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Genome 4:65–75CrossRefGoogle Scholar
  15. Heslot N, Yang HP, Sorrells ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160CrossRefGoogle Scholar
  16. Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. Theor Appl Genet 127:463–480CrossRefPubMedGoogle Scholar
  17. Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Coello CAC (ed) Learning and intelligent optimization. LION 2011. Lecture Notes in Computer Science, vol 6683. Springer, Berlin, Heidelberg. doi: 10.1007/978-3-642-25566-3_40
  18. Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13:455–492CrossRefGoogle Scholar
  19. Jordan DR, Mace ES, Cruickshank AW, Hunt CH, Henzell RG (2011) Exploring and exploiting genetic variation from unadapted sorghum germplasm in a breeding program. Crop Sci 51:1444–1457CrossRefGoogle Scholar
  20. Khazaei H, Street K, Bari A, Mackay M, Stoddard FL (2013) The FIGS (Focused Identification of Germplasm Strategy) approach identifies traits related to drought adaptation in Vicia faba genetic resources. PLoS One 8:e63107CrossRefPubMedPubMedCentralGoogle Scholar
  21. Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv Appl Math 6:4–22CrossRefGoogle Scholar
  22. Longin CFH, Reif JC (2014) Redesigning the exploitation of wheat genetic resources. Trends Plant Sci 19:631–636CrossRefPubMedGoogle Scholar
  23. Lorenz AJ, Smith KP, Jannink JL (2012) Potential and optimization of genomic selection for Fusarium head blight resistance in six-row barley. Crop Sci 52:1609–1621CrossRefGoogle Scholar
  24. McCouch S, Baute GJ, Bradeen J, Bramel P, Bretting PK, Buckler E et al (2013) Agriculture: feeding the future. Nature 499:23–24CrossRefPubMedGoogle Scholar
  25. Melchinger AE (1987) Expectation of means and variances of testcrosses produced from F 2 and backcross individuals and their selfed progenies. Heredity 59:105–115CrossRefGoogle Scholar
  26. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedPubMedCentralGoogle Scholar
  27. Mockus J (1994) Application of Bayesian approach to numerical methods of global and stochastic optimization. J Glob Optim 4:347–365CrossRefGoogle Scholar
  28. Morota G, Gianola D (2014) Kernel-based whole-genome prediction of complex traits: a review. Front Genet 5:363PubMedPubMedCentralGoogle Scholar
  29. Onogi A, Watanabe M, Mochizuki T, Hayashi T, Nakagawa H, Hasegawa T (2016) Toward integration of genomic selection with crop modeling: the development of an integrated approach to predicting rice heading dates. Theor Appl Genet 129:805–817CrossRefPubMedGoogle Scholar
  30. Pace J, Yu X, Lubberstedt T (2015) Genomic prediction of seedling root length in maize (Zea mays L.). Plant J 83:903–912CrossRefPubMedGoogle Scholar
  31. Perez P, de los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495CrossRefPubMedPubMedCentralGoogle Scholar
  32. Perez-Rodriguez P, Gianola D, Gonzalez-Camacho JM, Crossa J, Manes Y, Dreisigacker S (2012) Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 (Bethesda) 2:1595–1605CrossRefGoogle Scholar
  33. Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, CambridgeGoogle Scholar
  34. Rutkoski JE, Heffner EL, Sorrells ME (2011) Genomic selection for durable stem rust resistance in wheat. Euphytica 179:161–173CrossRefGoogle Scholar
  35. Sachs MM (2009) Cereal germplasm resources. Plant Physiol 149:148–151CrossRefPubMedPubMedCentralGoogle Scholar
  36. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N (2016) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104:148–175CrossRefGoogle Scholar
  37. Spindel JE, Begum H, Akdemir D, Collard B, Redona E et al (2016) Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116:395–408CrossRefPubMedPubMedCentralGoogle Scholar
  38. Tanksley SD, McCouch SR (1997) Seed banks and molecular maps: unlocking genetic potential from the wild. Science 227:1063–1066CrossRefGoogle Scholar
  39. Technow F, Messina CD, Totir LR, Cooper M (2015) Integrating crop growth models with whole genome prediction through approximate Bayesian computation. PLoS One 10:e0130855CrossRefPubMedPubMedCentralGoogle Scholar
  40. Tester M, Langridge P (2010) Breeding technologies to increase crop production in a changing world. Science 327:818–822CrossRefPubMedGoogle Scholar
  41. Utz HF, Bohn M, Melchinger AE (2001) Predicting progeny means and variances of winter wheat crosses from phenotypic values of their parents. Crop Sci 41:1470–1478CrossRefGoogle Scholar
  42. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423CrossRefPubMedGoogle Scholar
  43. Wurschum T, Reif JC, Kraft T, Janssen G, Zhao Y (2013) Genomic selection in sugar beet breeding populations. BMC Genet 14:85CrossRefPubMedPubMedCentralGoogle Scholar
  44. Yu X, Li X, Guo T, Zhu C, Wu Y, Mitchel SE et al (2016) Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat Plants 2:16150CrossRefPubMedGoogle Scholar
  45. Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH et al (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467CrossRefPubMedPubMedCentralGoogle Scholar
  46. Zhao Y, Gowda M, Liu W, Wurschum T, Maurer HP, Longin FH et al (2012) Accuracy of genomic selection in European maize elite breeding populations. Theor Appl Genet 124:769–776CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life SciencesThe University of TokyoTokyoJapan

Personalised recommendations