Abstract
Key message
We propose a novel approach to the Bayesian optimization of multivariate genomic prediction models based on secondary traits to improve accuracy gains and phenotyping costs via efficient Pareto frontier estimation.
Abstract
Multivariate genomic prediction based on secondary traits, such as data from various omics technologies including high-throughput phenotyping (e.g., unmanned aerial vehicle-based remote sensing), has attracted much attention because it offers improved accuracy gains compared with genomic prediction based only on marker genotypes. Although there is a trade-off between accuracy gains and phenotyping costs of secondary traits, no attempt has been made to optimize these trade-offs. In this study, we propose a novel approach to optimize multivariate genomic prediction models for secondary traits measurable at early growth stages for improved accuracy gains and phenotyping costs. The proposed approach employs Bayesian optimization for efficient Pareto frontier estimation, representing the maximum accuracy at a given cost. The proposed approach successfully estimated the optimal secondary trait combinations across a range of costs while providing genomic predictions for only about \(20 \%\) of all possible combinations. The simulation results reflecting the characteristics of each scenario of the simulated target traits showed that the obtained optimal combinations were reasonable. Analysis of real-time target trait data showed that the proposed multivariate genomic prediction model had significantly superior accuracy compared to the univariate genomic prediction model.
Similar content being viewed by others
Data availability
The genome sequencing data obtained with high-density rice array (HDRA) for all genotypes are available at the website of “Rice Diversity” project (http://www.ricediversity.org/data/). The datasets generated and analyzed in the present study are available from the “KosukeHamazaki/ROCAPS” repository in the GitHub, https://github.com/KosukeHamazaki/ROCAPS.
References
Becher M, Talke IN, Krall L, Krämer U (2004) Cross-species microarray transcript profiling reveals high constitutive expression of metal homeostasis genes in shoots of the zinc hyperaccumulator Arabidopsis halleri. Plant J 37:251–268
Brochu E, Cora VM, de Freitas N (2010) A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Hum Genet 98(1):116–126
Browning BL, Zhou Y, Browning SR (2018) A one-penny imputed genome from next-generation reference panels. Am J Hum Genet 103(3):338–348
Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81(5):1084–1097
Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype \(\times \) environment interaction using pedigree and dense molecular markers. Crop Sci 52(2):707–719
Bustos-Korts D, Malosetti M, Chenu K, Chapman S, Boer MP, Zheng B, van Eeuwijk FA (2019) From QTLs to adaptation landscapes: using genotype-to-phenotype models to characterize G\(\times \)E over time. Front Plant Sci 10:1–23
Calus MP, Veerkamp RF (2011) Accuracy of multi-trait genomic selection using different methods. Genet Sel Evol 43(1):1–14
Bishop CM (2006) Pattern recognition and machine learning. Springer Science+Business Media, New York
Crain J, Mondal S, Rutkoski J, Singh RP, Poland J (2018) Combining high-throughput phenotyping and genomic information to increase prediction and selection accuracy in wheat breeding. Plant Genome 11(1):1–14
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA et al (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158
de los Campos G, (2019) MTM: MTM. R package version 1
de los Compos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb) 92(4):295–308
Vazquez AI, Fernando R, Klimentidis YC, Sorensen D, de los Compos G (2013) Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 9(7):6552
Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite profiling for plant functional genomics. Nat Biotechnol 18:1157–1161
Gaunt TR, Rodríguez S, Day IN (2007) Cubic exact solutions for the estimation of pairwise haplotype frequencies: Implications for linkage disequilibrium analyses and a web tool ‘CubeX’. BMC Bioinformatics 8:1–9
Gianola D, Van Kaam JB (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4):2289–2303
Gitelson AA, Kaufman YJ, Merzlyak MN (1996) Use of a green channel in remote sensing of global vegetation from EOS-MODIS. Remote Sens Environ 58(3):289–298
Guo G, Zhao F, Wang Y, Zhang Y, Du L, Su G (2014) Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet 15:1–7
Hamazaki K, Iwata H (2020) RAINBOW: haplotype-based genome-wide association study using a novel SNP-set method. PLoS Comput Biol 16(2):e1007663
Henderson CR (1984) Applications of Linear Models in Animal Breeding Models. Guelph Ontario, Univ., Guelph
Jia Y, Jannink J-L (2012) Multiple-trait genomic selection methods increase genetic value prediction accuracy. Genetics 192(4):1513–1522
Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR (2002) Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci USA 99(18):11969–11974
Krause MR, González-Pérez L, Crossa J, Pérez-Rodríguez P, Montesinos-López O, Singh RP, Dreisigacker S, Poland J, Rutkoski J, Sorrells M, Gore MA, Mondal S (2019) Hyperspectral reflectance-derived relationship matrices for genomic prediction of grain yield in wheat. G3 (Bethesda) 9(4):1231–1247
Leonhardt N, Kwak JM, Robert N, Waner D, Leonhardt G, Schroeder JI (2004) Microarray expression analyses of Arabidopsis guard cells and isolation of a recessive abscisic acid hypersensitive protein phosphatase 2C mutant. Plant Cell 16(3):595–615
Martzivanou M, Hampp R (2003) Hyper-gravity effects on the arabidopsis transcriptome. Physiol Plant 118:221–231
McCouch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, Singh N, DeClerck G, Agosto-Perez F, Korniliev P, Greenberg AJ, Naredo MEB, Mercado SMQ, Harrington SE, Shi Y, Branchini DA, Kuser-Falcão PR, Leung H, Ebana K, Yano M, Eizenga G, McClung A, Mezey J (2016) Open access resources for genome-wide association mapping in rice. Nat Commun 7(1):10532
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
Pauli D, Chapman SC, Bart R, Topp CN, Lawrence-Dill CJ, Poland J, Gore MA (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172(2):622–634
Pérez P, De Los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(2):483–495
Pszczola M, Veerkamp RF, de Haas Y, Wall E, Strabel T, Calus MPL (2013) Effect of predictor traits on accuracy of genomic breeding values for feed intake based on a limited cow reference population. Animal 7(11):1759–1768
Purcell S, Chang C (2018) PLINK 1.9
R Core Team (2019) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Rutkoski J, Poland J, Mondal S, Autrique E, Pérez LG, Crossa J, Reynolds M, Singh R (2016) Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3 (Bethesda) 6(9):2799–2808
Salt DE (2004) Update on plant ionomics. Plant Physiol 136(1):2451–2456
Sarkar D (2008) Lattice: multivariate data visualization with R. Springer, New York
Shi Y, Thomasson JA, Murray SC, Pugh NA, Rooney WL, Shafian S, Rajan N, Rouze G, Morgan CLS, Neely HL, Rana A, Bagavathiannan MV, Henrickson J, Bowden E, Valasek J, Olsenholler J, Bishop MP, Sheridan R, Putman EB, Popescu S, Burks T, Cope D, Ibrahim A, McCutchen BF, Baltensperger DD, Avant RV Jr, Vidrine M, Yang C (2016) Unmanned aerial vehicles for high-throughput phenotyping and agronomic. PLoS One 11(7):e0159781
Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink J-L, Sorrells ME (2017) Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield. Plant. Genome 10(2), plantgenome2016.11.0111
Taliun D, Gamper J, Pattaro C (2014) Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics 15(1):1–18
Tucker CJ (1979) Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens Environ 8(2):127–150
Viña A, Gitelson AA, Nguy-Robertson AL, Peng Y (2011) Comparison of different vegetation indices for the remote assessment of green leaf area index of crops. Remote Sens Environ 115(12):3468–3478
Wang J, Do H, Woznica A, Kalousis A (2011) Metric learning with multiple kernels. Adv Neural Inf Process Syst 81:1170–1178
Wei T, Simko V (2017) R package ”corrplot”: Visualization of a Correlation Matrix. (Version 0.84)
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer-Verlag, New York
Wood S (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman and Hall/CRC, USA
Wood S, Pyasafken B (2016) Smoothing parameter and model selection for general smooth models (with discussion). J Am Stat Assoc 111:1548–1575
Wood SN (2003) Thin-plate regression splines. J R Stat Soc (B) 65(1):95–114
Wood SN (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. J Am Stat Assoc 99(467):673–686
Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc (B) 73(1):3–36
Yang W, Feng H, Zhang X, Zhang J, Doonan JH, Batchelor WD, Xiong L, Yan J (2020) Crop phenomics and high-throughput phenotyping: past decades, current challenges, and future perspectives. Mol Plant 13(2):187–214
Zhao K, Tung C-W, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467
Acknowledgements
We are grateful to Dr. Ryokei Tanaka for fruitful discussions on the application of Bayesian optimization as well as to Dr. Motoyuki Ishimori and Ms. Miho Maeta for fruitful discussions on how to determine the appropriate phenotyping costs in field trials. We would like to thank Editage (www.editage.com) for English language editing.
Funding
This study was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP 20J2123. This work was also supported by Japan Science and Technology (JST) Core Research for Evolutional Science and Technology (CREST) (https://www.jst.go.jp/kisoken/crest/en/index.html) Grant Number JPMJCR16O2, Japan. The funders had no role in the study design, data collection, and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
KH developed the proposed method on the Bayesian optimization of multivariate genomic prediction models, conducted all statistical analyses, and drafted the manuscript. HI conceived and designed the study, provided administrative support, and supervised the study. Both authors have read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
Not applicable. The authors declare that there are no conflicts of interest.
Code availability
The scripts used in the current study are available from the “KosukeHamazaki/ROCAPS” repository in the GitHub, https://github.com/KosukeHamazaki/ROCAPS.
Additional information
Communicated by Benjamin Stich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hamazaki, K., Iwata, H. Bayesian optimization of multivariate genomic prediction models based on secondary traits for improved accuracy gains and phenotyping costs. Theor Appl Genet 135, 35–50 (2022). https://doi.org/10.1007/s00122-021-03949-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-021-03949-1