A robust Bayesian genome-based median regression model
Abstract
Key message
Current genome-enabled prediction models assumed errors normally distributed, which are sensitive to outliers. We propose a model with errors assumed to follow a Laplace distribution to deal better with outliers.
Abstract
Current genome-enabled prediction models use regressions that fit the expected value (mean) of a response variable with errors assumed normally distributed, which are often sensitive to outliers, either genetic or environmental. For this reason, we propose a robust Bayesian genome median regression (BGMR) model that fits regressions to the medians of a distribution, with errors assumed to follow a Laplace distribution to deal better with outliers. The BGMR model was evaluated under a Bayesian framework with Markov Chain Monte Carlo sampling using a location–scale mixture representation of the Laplace distribution. The BGMR was implemented with two simulated and two real genomic data sets, and we compared its prediction performance with that of a conventional genomic best linear unbiased prediction (GBLUP) model and the Laplace maximum a posteriori (LMAP) method. The prediction accuracies of BGMR were higher than those of the GBLUP and LMAP methods when there were outliers. The BGMR model could be useful to breeders who need to predict and select genotypes based on data with unknown outliers.
Notes
Acknowledgments
We thank all scientists, field workers, and lab assistants from National Programs and CIMMYT who collected the data used in this study. We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR Grant 267806. We are also thankful for the financial support provided by CIMMYT CRP (maize and wheat), the Bill & Melinda Gates Foundation, as well the USAID projects (Cornell University and Kansas State University) that financed the collection of the CIMMYT maize and wheat data analyzed in this study.
Compliance with ethical standards
Conflict of interest
The authors declare they do not have any conflict of interest.
References
- Crossa J, de los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724. https://doi.org/10.1534/genetics.110.118521 Google Scholar
- de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2013) Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345Google Scholar
- Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285Google Scholar
- Fen F, Wang H, Lu N, Chen T, He H, Lu Y, Tu XM (2014) Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry 26(2):105–109. https://doi.org/10.3969/j.issn.1002-0829.2014.02.009 Google Scholar
- Feng C, Wang H, Lu N, Tu XM (2012) Log-transformation: applications and interpretation in biomedical research. Stat Med 32:230–239. https://doi.org/10.1002/sim.5486 Google Scholar
- Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R (2009) Additive genetic variability and the bayesian alphabet. Genetics 183(1):347–363Google Scholar
- Gianola D, Cecchinato A, Naya H, Schön C-C (2018) Prediction of complex traits: robust alternatives to best linear unbiased prediction. Front Genet 9:195. https://doi.org/10.3389/fgene.2018.00195 Google Scholar
- Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2005) Robust statistics: the approach based on influence functions. Wiley, LondonGoogle Scholar
- Huber P (1973) Robust regression: asymptotics, conjectures, and monte carlo. Ann Stat 1(5):799–821Google Scholar
- Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50Google Scholar
- Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81(11):1565–1578Google Scholar
- Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the T-distribution. J Am Stat Assoc 84:881–896Google Scholar
- Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, Schön CC (2013) Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol 12(3):375–391. https://doi.org/10.1515/sagmb-2012-0042 Google Scholar
- Li Z, Möttönen J, Sillanpää MJ (2015) A robust multiple-locus method for quantitative trait locus analysis of non-normally distributed multiple traits. Heredity 115(6):556–564Google Scholar
- Lourenço VM, Pires AM (2014) M-regression, false discovery rates and outlier detection with application to genetic association studies. J Comput Stat Data Anal 78:33–42Google Scholar
- Lourenço VM, Pires AM, Kirst M (2011) Robust linear regression methods in association studies. Bioinformatics 27(6):815–821Google Scholar
- Lourenço VM, Rodrigues PC, Pires AM, Piepho H-P (2017) A robust DF-REML framework for variance components estimation in genetic studies. Bioinformatics 33(22):3584–3594Google Scholar
- Montesinos-López OA, Montesinos-López A, Crossa J, Toledo F, Pérez-Hernández O, Eskridge KM, Rutkoski J (2016) A genomic bayesian multi-trait and multi-environment model. G3: Genes|Genomes|Genetics 6(9):2725–2744Google Scholar
- Nascimento M, de Resende MD, Cruz CD, Nascimento AC, Viana JM, Azevedo CF, Barroso LM (2017) Regularized quantile regression applied to genome-enabled prediction of quantitative traits. Genet Mol Res. https://doi.org/10.4238/gmr16019538 Google Scholar
- Ould-Estaghvirou SB, Ogutu JO, Piepho HP (2014) Influence of outliers on accuracy estimation in genomic prediction in plant breeding. G3: Genes, Genomes, Genetics 4(12):2317–2328Google Scholar
- Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686Google Scholar
- Pérez P, de los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R. Plant Genome 3:106–116Google Scholar
- Pérez-Rodríguez P, de los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495Google Scholar
- Pourhoseingholi A, Pourhoseingholi MA, Vahedi M, Moghimi-Dehkordi B, Maserat AS, Zali MR (2009) Relation between demographic factors and hospitalization in patients with gastrointestinal disorders, using quantile regression analysis. East Afr J Public Health 6(1):45–47Google Scholar
- Rodrigues PC, Monteiro A, Lourenço VM (2016) A robust AMMI model for the analysis of genotype-by-environment data. Bioinformatics 32(1):58–66Google Scholar
- Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880Google Scholar
- Seber GAF, Lee AJ (2003) Linear regression analysis, 2nd edn. Wiley, HobokenGoogle Scholar
- Strandén I, Gianola D (1998) Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study. Genet Sel Evol 30:565–583Google Scholar
- Strandén I, Gianola D (1999) Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach. Genet Sel Evol 31:25–42. https://doi.org/10.1186/1297-9686-31-1-25 Google Scholar
- VanRaden PM (2007) Genomic measures of relationship and inbreeding. Interbull Bull 37:33–36Google Scholar
- Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656Google Scholar
- Yu K, Moyeed A (2001) Bayesian quantile regression. Stat Probab Lett 54:437–447Google Scholar