Skip to main content
Log in

Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs

  • Original Article
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

We propose a novel approach to the Bayesian optimization of multivariate genomic prediction models based on secondary traits to improve accuracy gains and phenotyping costs via efficient Pareto frontier estimation.

Abstract

Multivariate genomic prediction based on secondary traits, such as data from various omics technologies including high-throughput phenotyping (e.g., unmanned aerial vehicle-based remote sensing), has attracted much attention because it offers improved accuracy gains compared with genomic prediction based only on marker genotypes. Although there is a trade-off between accuracy gains and phenotyping costs of secondary traits, no attempt has been made to optimize these trade-offs. In this study, we propose a novel approach to optimize multivariate genomic prediction models for secondary traits measurable at early growth stages for improved accuracy gains and phenotyping costs. The proposed approach employs Bayesian optimization for efficient Pareto frontier estimation, representing the maximum accuracy at a given cost. The proposed approach successfully estimated the optimal secondary trait combinations across a range of costs while providing genomic predictions for only about \(20 \%\) of all possible combinations. The simulation results reflecting the characteristics of each scenario of the simulated target traits showed that the obtained optimal combinations were reasonable. Analysis of real-time target trait data showed that the proposed multivariate genomic prediction model had significantly superior accuracy compared to the univariate genomic prediction model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The genome sequencing data obtained with high-density rice array (HDRA) for all genotypes are available at the website of “Rice Diversity” project (http://www.ricediversity.org/data/). The datasets generated and analyzed in the present study are available from the “KosukeHamazaki/ROCAPS” repository in the GitHub, https://github.com/KosukeHamazaki/ROCAPS.

References

  • Becher M, Talke IN, Krall L, Krämer U (2004) Cross-species microarray transcript profiling reveals high constitutive expression of metal homeostasis genes in shoots of the zinc hyperaccumulator Arabidopsis halleri. Plant J 37:251–268

    Article  CAS  PubMed  Google Scholar 

  • Brochu E, Cora VM, de Freitas N (2010) A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

  • Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Hum Genet 98(1):116–126

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103(3):338–348

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype \(\times \) environment interaction using pedigree and dense molecular markers. Crop Sci 52(2):707–719

    Article  Google Scholar 

  • Bustos-Korts D, Malosetti M, Chenu K, Chapman S, Boer MP, Zheng B, van Eeuwijk FA (2019) From QTLs to adaptation landscapes: using genotype-to-phenotype models to characterize G\(\times \)E over time. Front Plant Sci 10:1–23

    Article  Google Scholar 

  • Calus MP, Veerkamp RF (2011) Accuracy of multi-trait genomic selection using different methods. Genet Sel Evol 43(1):1–14

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer Science+Business Media, New York

    Google Scholar 

  • Crain J, Mondal S, Rutkoski J, Singh RP, Poland J (2018) Combining high-throughput phenotyping and genomic information to increase prediction and selection accuracy in wheat breeding. Plant Genome 11(1):1–14

    Article  Google Scholar 

  • Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • de los Campos G, (2019) MTM: MTM. R package version 1

  • de los Compos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb) 92(4):295–308

    Article  Google Scholar 

  • Vazquez AI, Fernando R, Klimentidis YC, Sorensen D, de los Compos G (2013) Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 9(7):6552

    Google Scholar 

  • Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite profiling for plant functional genomics. Nat Biotechnol 18:1157–1161

    Article  CAS  PubMed  Google Scholar 

  • Gaunt TR, Rodríguez S, Day IN (2007) Cubic exact solutions for the estimation of pairwise haplotype frequencies: Implications for linkage disequilibrium analyses and a web tool ‘CubeX’. BMC Bioinformatics 8:1–9

    Article  Google Scholar 

  • Gianola D, Van Kaam JB (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4):2289–2303

    Article  PubMed  PubMed Central  Google Scholar 

  • Gitelson AA, Kaufman YJ, Merzlyak MN (1996) Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens Environ 58(3):289–298

    Article  Google Scholar 

  • Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G (2014) Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 15:1–7

    Article  CAS  Google Scholar 

  • Hamazaki K, Iwata H (2020) RAINBOW: haplotype-based genome-wide association study using a novel SNP-set method. PLoS Comput Biol 16(2):e1007663

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Henderson CR (1984) Applications of Linear Models in Animal Breeding Models. Guelph Ontario, Univ., Guelph

  • Jia Y, Jannink J-L (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192(4):1513–1522

    Article  PubMed  PubMed Central  Google Scholar 

  • Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR (2002) Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci USA 99(18):11969–11974

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Krause MR, González-Pérez L, Crossa J, Pérez-Rodríguez P, Montesinos-López O, Singh RP, Dreisigacker S, Poland J, Rutkoski J, Sorrells M, Gore MA, Mondal S (2019) Hyperspectral reflectance-derived relationship matrices for genomic prediction of grain yield in wheat. G3 (Bethesda) 9(4):1231–1247

  • Leonhardt N, Kwak JM, Robert N, Waner D, Leonhardt G, Schroeder JI (2004) Microarray expression analyses of Arabidopsis guard cells and isolation of a recessive abscisic acid hypersensitive protein phosphatase 2C mutant. Plant Cell 16(3):595–615

    Article  Google Scholar 

  • Martzivanou M, Hampp R (2003) Hyper-gravity effects on the arabidopsis transcriptome. Physiol Plant 118:221–231

    Article  CAS  PubMed  Google Scholar 

  • McCouch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, Singh N, DeClerck G, Agosto-Perez F, Korniliev P, Greenberg AJ, Naredo MEB, Mercado SMQ, Harrington SE, Shi Y, Branchini DA, Kuser-Falcão PR, Leung H, Ebana K, Yano M, Eizenga G, McClung A, Mezey J (2016) Open access resources for genome-wide association mapping in rice. Nat Commun 7(1):10532

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pauli D, Chapman SC, Bart R, Topp CN, Lawrence-Dill CJ, Poland J, Gore MA (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172(2):622–634

    CAS  PubMed  PubMed Central  Google Scholar 

  • Pérez P, De Los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483–495

    Article  PubMed  PubMed Central  Google Scholar 

  • Pszczola M, Veerkamp RF, de Haas Y, Wall E, Strabel T, Calus MPL (2013) Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population. Animal 7(11):1759–1768

    Article  CAS  PubMed  Google Scholar 

  • Purcell S, Chang C (2018) PLINK 1.9

  • R Core Team (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  • Rutkoski J, Poland J, Mondal S, Autrique E, Pérez LG, Crossa J, Reynolds M, Singh R (2016) Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3 (Bethesda) 6(9):2799–2808

  • Salt DE (2004) Update on plant ionomics. Plant Physiol 136(1):2451–2456

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New York

    Book  Google Scholar 

  • Shi Y, Thomasson JA, Murray SC, Pugh NA, Rooney WL, Shafian S, Rajan N, Rouze G, Morgan CLS, Neely HL, Rana A, Bagavathiannan MV, Henrickson J, Bowden E, Valasek J, Olsenholler J, Bishop MP, Sheridan R, Putman EB, Popescu S, Burks T, Cope D, Ibrahim A, McCutchen BF, Baltensperger DD, Avant RV Jr, Vidrine M, Yang C (2016) Unmanned aerial vehicles for high-throughput phenotyping and agronomic. PLoS One 11(7):e0159781

    Article  PubMed  PubMed Central  Google Scholar 

  • Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink J-L, Sorrells ME (2017) Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield. Plant. Genome 10(2), plantgenome2016.11.0111

  • Taliun D, Gamper J, Pattaro C (2014) Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics 15(1):1–18

    Article  Google Scholar 

  • Tucker CJ (1979) Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens Environ 8(2):127–150

    Article  Google Scholar 

  • Viña A, Gitelson AA, Nguy-Robertson AL, Peng Y (2011) Comparison of different vegetation indices for the remote assessment of green leaf area index of crops. Remote Sens Environ 115(12):3468–3478

    Article  Google Scholar 

  • Wang J, Do H, Woznica A, Kalousis A (2011) Metric learning with multiple kernels. Adv Neural Inf Process Syst 81:1170–1178

    Google Scholar 

  • Wei T, Simko V (2017) R package ”corrplot”: Visualization of a Correlation Matrix. (Version 0.84)

  • Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer-Verlag, New York

    Book  Google Scholar 

  • Wood S (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman and Hall/CRC, USA

    Book  Google Scholar 

  • Wood S, Pyasafken B (2016) Smoothing parameter and model selection for general smooth models (with discussion). J Am Stat Assoc 111:1548–1575

    Article  CAS  Google Scholar 

  • Wood SN (2003) Thin-plate regression splines. J R Stat Soc (B) 65(1):95–114

    Article  Google Scholar 

  • Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99(467):673–686

    Article  Google Scholar 

  • Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc (B) 73(1):3–36

    Article  Google Scholar 

  • Yang W, Feng H, Zhang X, Zhang J, Doonan JH, Batchelor WD, Xiong L, Yan J (2020) Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant 13(2):187–214

    Article  CAS  PubMed  Google Scholar 

  • Zhao K, Tung C-W, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to Dr. Ryokei Tanaka for fruitful discussions on the application of Bayesian optimization as well as to Dr. Motoyuki Ishimori and Ms. Miho Maeta for fruitful discussions on how to determine the appropriate phenotyping costs in field trials. We would like to thank Editage (www.editage.com) for English language editing.

Funding

This study was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP 20J2123. This work was also supported by Japan Science and Technology (JST) Core Research for Evolutional Science and Technology (CREST) (https://www.jst.go.jp/kisoken/crest/en/index.html) Grant Number JPMJCR16O2, Japan. The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

KH developed the proposed method on the Bayesian optimization of multivariate genomic prediction models, conducted all statistical analyses, and drafted the manuscript. HI conceived and designed the study, provided administrative support, and supervised the study. Both authors have read and approved the final manuscript.

Corresponding author

Correspondence to Hiroyoshi Iwata.

Ethics declarations

Conflict of interest

Not applicable. The authors declare that there are no conflicts of interest.

Code availability

The scripts used in the current study are available from the “KosukeHamazaki/ROCAPS” repository in the GitHub, https://github.com/KosukeHamazaki/ROCAPS.

Additional information

Communicated by Benjamin Stich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 1260 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamazaki, K., Iwata, H. Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs. Theor Appl Genet 135, 35–50 (2022). https://doi.org/10.1007/s00122-021-03949-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-021-03949-1

Navigation