Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs

Hamazaki, Kosuke; Iwata, Hiroyoshi

doi:10.1007/s00122-021-03949-1

Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs

Original Article
Published: 05 October 2021

Volume 135, pages 35–50, (2022)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

1378 Accesses
1 Citation
8 Altmetric
Explore all metrics

Abstract

Key message

We propose a novel approach to the Bayesian optimization of multivariate genomic prediction models based on secondary traits to improve accuracy gains and phenotyping costs via efficient Pareto frontier estimation.

Abstract

Multivariate genomic prediction based on secondary traits, such as data from various omics technologies including high-throughput phenotyping (e.g., unmanned aerial vehicle-based remote sensing), has attracted much attention because it offers improved accuracy gains compared with genomic prediction based only on marker genotypes. Although there is a trade-off between accuracy gains and phenotyping costs of secondary traits, no attempt has been made to optimize these trade-offs. In this study, we propose a novel approach to optimize multivariate genomic prediction models for secondary traits measurable at early growth stages for improved accuracy gains and phenotyping costs. The proposed approach employs Bayesian optimization for efficient Pareto frontier estimation, representing the maximum accuracy at a given cost. The proposed approach successfully estimated the optimal secondary trait combinations across a range of costs while providing genomic predictions for only about \(20 \%\) of all possible combinations. The simulation results reflecting the characteristics of each scenario of the simulated target traits showed that the obtained optimal combinations were reasonable. Analysis of real-time target trait data showed that the proposed multivariate genomic prediction model had significantly superior accuracy compared to the univariate genomic prediction model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical Methods for the Quantitative Genetic Analysis of High-Throughput Phenotyping Data

A new approach fits multivariate genomic prediction models efficiently

Article Open access 17 June 2022

General Elements of Genomic Selection and Statistical Learning

Data availability

The genome sequencing data obtained with high-density rice array (HDRA) for all genotypes are available at the website of “Rice Diversity” project (http://www.ricediversity.org/data/). The datasets generated and analyzed in the present study are available from the “KosukeHamazaki/ROCAPS” repository in the GitHub, https://github.com/KosukeHamazaki/ROCAPS.

References

Becher M, Talke IN, Krall L, Krämer U (2004) Cross-species microarray transcript profiling reveals high constitutive expression of metal homeostasis genes in shoots of the zinc hyperaccumulator Arabidopsis halleri. Plant J 37:251–268
Article CAS PubMed Google Scholar
Brochu E, Cora VM, de Freitas N (2010) A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Hum Genet 98(1):116–126
Article CAS PubMed PubMed Central Google Scholar
Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103(3):338–348
Article CAS PubMed PubMed Central Google Scholar
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097
Article CAS PubMed PubMed Central Google Scholar
Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype \(\times \) environment interaction using pedigree and dense molecular markers. Crop Sci 52(2):707–719
Article Google Scholar
Bustos-Korts D, Malosetti M, Chenu K, Chapman S, Boer MP, Zheng B, van Eeuwijk FA (2019) From QTLs to adaptation landscapes: using genotype-to-phenotype models to characterize G\(\times \)E over time. Front Plant Sci 10:1–23
Article Google Scholar
Calus MP, Veerkamp RF (2011) Accuracy of multi-trait genomic selection using different methods. Genet Sel Evol 43(1):1–14
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer Science+Business Media, New York
Google Scholar
Crain J, Mondal S, Rutkoski J, Singh RP, Poland J (2018) Combining high-throughput phenotyping and genomic information to increase prediction and selection accuracy in wheat breeding. Plant Genome 11(1):1–14
Article Google Scholar
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158
Article CAS PubMed PubMed Central Google Scholar
de los Campos G, (2019) MTM: MTM. R package version 1
de los Compos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb) 92(4):295–308
Article Google Scholar
Vazquez AI, Fernando R, Klimentidis YC, Sorensen D, de los Compos G (2013) Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 9(7):6552
Google Scholar
Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite profiling for plant functional genomics. Nat Biotechnol 18:1157–1161
Article CAS PubMed Google Scholar
Gaunt TR, Rodríguez S, Day IN (2007) Cubic exact solutions for the estimation of pairwise haplotype frequencies: Implications for linkage disequilibrium analyses and a web tool ‘CubeX’. BMC Bioinformatics 8:1–9
Article Google Scholar
Gianola D, Van Kaam JB (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4):2289–2303
Article PubMed PubMed Central Google Scholar
Gitelson AA, Kaufman YJ, Merzlyak MN (1996) Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens Environ 58(3):289–298
Article Google Scholar
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G (2014) Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 15:1–7
Article CAS Google Scholar
Hamazaki K, Iwata H (2020) RAINBOW: haplotype-based genome-wide association study using a novel SNP-set method. PLoS Comput Biol 16(2):e1007663
Article CAS PubMed PubMed Central Google Scholar
Henderson CR (1984) Applications of Linear Models in Animal Breeding Models. Guelph Ontario, Univ., Guelph
Jia Y, Jannink J-L (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192(4):1513–1522
Article PubMed PubMed Central Google Scholar
Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR (2002) Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci USA 99(18):11969–11974
Article CAS PubMed PubMed Central Google Scholar
Krause MR, González-Pérez L, Crossa J, Pérez-Rodríguez P, Montesinos-López O, Singh RP, Dreisigacker S, Poland J, Rutkoski J, Sorrells M, Gore MA, Mondal S (2019) Hyperspectral reflectance-derived relationship matrices for genomic prediction of grain yield in wheat. G3 (Bethesda) 9(4):1231–1247
Leonhardt N, Kwak JM, Robert N, Waner D, Leonhardt G, Schroeder JI (2004) Microarray expression analyses of Arabidopsis guard cells and isolation of a recessive abscisic acid hypersensitive protein phosphatase 2C mutant. Plant Cell 16(3):595–615
Article Google Scholar
Martzivanou M, Hampp R (2003) Hyper-gravity effects on the arabidopsis transcriptome. Physiol Plant 118:221–231
Article CAS PubMed Google Scholar
McCouch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, Singh N, DeClerck G, Agosto-Perez F, Korniliev P, Greenberg AJ, Naredo MEB, Mercado SMQ, Harrington SE, Shi Y, Branchini DA, Kuser-Falcão PR, Leung H, Ebana K, Yano M, Eizenga G, McClung A, Mezey J (2016) Open access resources for genome-wide association mapping in rice. Nat Commun 7(1):10532
Article CAS PubMed PubMed Central Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
Article CAS PubMed PubMed Central Google Scholar
Pauli D, Chapman SC, Bart R, Topp CN, Lawrence-Dill CJ, Poland J, Gore MA (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172(2):622–634
CAS PubMed PubMed Central Google Scholar
Pérez P, De Los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483–495
Article PubMed PubMed Central Google Scholar
Pszczola M, Veerkamp RF, de Haas Y, Wall E, Strabel T, Calus MPL (2013) Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population. Animal 7(11):1759–1768
Article CAS PubMed Google Scholar
Purcell S, Chang C (2018) PLINK 1.9
R Core Team (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Google Scholar
Rutkoski J, Poland J, Mondal S, Autrique E, Pérez LG, Crossa J, Reynolds M, Singh R (2016) Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3 (Bethesda) 6(9):2799–2808
Salt DE (2004) Update on plant ionomics. Plant Physiol 136(1):2451–2456
Article CAS PubMed PubMed Central Google Scholar
Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New York
Book Google Scholar
Shi Y, Thomasson JA, Murray SC, Pugh NA, Rooney WL, Shafian S, Rajan N, Rouze G, Morgan CLS, Neely HL, Rana A, Bagavathiannan MV, Henrickson J, Bowden E, Valasek J, Olsenholler J, Bishop MP, Sheridan R, Putman EB, Popescu S, Burks T, Cope D, Ibrahim A, McCutchen BF, Baltensperger DD, Avant RV Jr, Vidrine M, Yang C (2016) Unmanned aerial vehicles for high-throughput phenotyping and agronomic. PLoS One 11(7):e0159781
Article PubMed PubMed Central Google Scholar
Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink J-L, Sorrells ME (2017) Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield. Plant. Genome 10(2), plantgenome2016.11.0111
Taliun D, Gamper J, Pattaro C (2014) Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics 15(1):1–18
Article Google Scholar
Tucker CJ (1979) Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens Environ 8(2):127–150
Article Google Scholar
Viña A, Gitelson AA, Nguy-Robertson AL, Peng Y (2011) Comparison of different vegetation indices for the remote assessment of green leaf area index of crops. Remote Sens Environ 115(12):3468–3478
Article Google Scholar
Wang J, Do H, Woznica A, Kalousis A (2011) Metric learning with multiple kernels. Adv Neural Inf Process Syst 81:1170–1178
Google Scholar
Wei T, Simko V (2017) R package ”corrplot”: Visualization of a Correlation Matrix. (Version 0.84)
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer-Verlag, New York
Book Google Scholar
Wood S (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman and Hall/CRC, USA
Book Google Scholar
Wood S, Pyasafken B (2016) Smoothing parameter and model selection for general smooth models (with discussion). J Am Stat Assoc 111:1548–1575
Article CAS Google Scholar
Wood SN (2003) Thin-plate regression splines. J R Stat Soc (B) 65(1):95–114
Article Google Scholar
Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99(467):673–686
Article Google Scholar
Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc (B) 73(1):3–36
Article Google Scholar
Yang W, Feng H, Zhang X, Zhang J, Doonan JH, Batchelor WD, Xiong L, Yan J (2020) Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant 13(2):187–214
Article CAS PubMed Google Scholar
Zhao K, Tung C-W, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467
Article PubMed Google Scholar

Download references

Acknowledgements

We are grateful to Dr. Ryokei Tanaka for fruitful discussions on the application of Bayesian optimization as well as to Dr. Motoyuki Ishimori and Ms. Miho Maeta for fruitful discussions on how to determine the appropriate phenotyping costs in field trials. We would like to thank Editage (www.editage.com) for English language editing.

Funding

This study was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP 20J2123. This work was also supported by Japan Science and Technology (JST) Core Research for Evolutional Science and Technology (CREST) (https://www.jst.go.jp/kisoken/crest/en/index.html) Grant Number JPMJCR16O2, Japan. The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
Kosuke Hamazaki & Hiroyoshi Iwata
JSPS Research Fellow, Tokyo, Japan
Kosuke Hamazaki

Authors

Kosuke Hamazaki
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyoshi Iwata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

KH developed the proposed method on the Bayesian optimization of multivariate genomic prediction models, conducted all statistical analyses, and drafted the manuscript. HI conceived and designed the study, provided administrative support, and supervised the study. Both authors have read and approved the final manuscript.

Corresponding author

Correspondence to Hiroyoshi Iwata.

Ethics declarations

Conflict of interest

Not applicable. The authors declare that there are no conflicts of interest.

Code availability

The scripts used in the current study are available from the “KosukeHamazaki/ROCAPS” repository in the GitHub, https://github.com/KosukeHamazaki/ROCAPS.

Additional information

Communicated by Benjamin Stich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 1260 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hamazaki, K., Iwata, H. Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs. Theor Appl Genet 135, 35–50 (2022). https://doi.org/10.1007/s00122-021-03949-1

Download citation

Received: 21 November 2020
Accepted: 14 September 2021
Published: 05 October 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s00122-021-03949-1

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs