Advertisement

A robust Bayesian genome-based median regression model

  • Abelardo Montesinos-López
  • Osval A. Montesinos-LópezEmail author
  • Enrique R. Villa-Diharce
  • Daniel GianolaEmail author
  • José Crossa
Original Article

Abstract

Key message

Current genome-enabled prediction models assumed errors normally distributed, which are sensitive to outliers. We propose a model with errors assumed to follow a Laplace distribution to deal better with outliers.

Abstract

Current genome-enabled prediction models use regressions that fit the expected value (mean) of a response variable with errors assumed normally distributed, which are often sensitive to outliers, either genetic or environmental. For this reason, we propose a robust Bayesian genome median regression (BGMR) model that fits regressions to the medians of a distribution, with errors assumed to follow a Laplace distribution to deal better with outliers. The BGMR model was evaluated under a Bayesian framework with Markov Chain Monte Carlo sampling using a location–scale mixture representation of the Laplace distribution. The BGMR was implemented with two simulated and two real genomic data sets, and we compared its prediction performance with that of a conventional genomic best linear unbiased prediction (GBLUP) model and the Laplace maximum a posteriori (LMAP) method. The prediction accuracies of BGMR were higher than those of the GBLUP and LMAP methods when there were outliers. The BGMR model could be useful to breeders who need to predict and select genotypes based on data with unknown outliers.

Notes

Acknowledgments

We thank all scientists, field workers, and lab assistants from National Programs and CIMMYT who collected the data used in this study. We acknowledge the financial support provided by the Foundation for Research Levy on Agricultural Products (FFL) and the Agricultural Agreement Research Fund (JA) in Norway through NFR Grant 267806. We are also thankful for the financial support provided by CIMMYT CRP (maize and wheat), the Bill & Melinda Gates Foundation, as well the USAID projects (Cornell University and Kansas State University) that financed the collection of the CIMMYT maize and wheat data analyzed in this study.

Compliance with ethical standards

Conflict of interest

The authors declare they do not have any conflict of interest.

References

  1. Crossa J, de los Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724.  https://doi.org/10.1534/genetics.110.118521 Google Scholar
  2. de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MPL (2013) Whole genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345Google Scholar
  3. Edgeworth FY (1887) On observations relating to several quantities. Hermathena 6:279–285Google Scholar
  4. Fen F, Wang H, Lu N, Chen T, He H, Lu Y, Tu XM (2014) Log-transformation and its implications for data analysis. Shanghai Arch Psychiatry 26(2):105–109.  https://doi.org/10.3969/j.issn.1002-0829.2014.02.009 Google Scholar
  5. Feng C, Wang H, Lu N, Tu XM (2012) Log-transformation: applications and interpretation in biomedical research. Stat Med 32:230–239.  https://doi.org/10.1002/sim.5486 Google Scholar
  6. Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R (2009) Additive genetic variability and the bayesian alphabet. Genetics 183(1):347–363Google Scholar
  7. Gianola D, Cecchinato A, Naya H, Schön C-C (2018) Prediction of complex traits: robust alternatives to best linear unbiased prediction. Front Genet 9:195.  https://doi.org/10.3389/fgene.2018.00195 Google Scholar
  8. Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2005) Robust statistics: the approach based on influence functions. Wiley, LondonGoogle Scholar
  9. Huber P (1973) Robust regression: asymptotics, conjectures, and monte carlo. Ann Stat 1(5):799–821Google Scholar
  10. Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50Google Scholar
  11. Kozumi H, Kobayashi G (2011) Gibbs sampling methods for Bayesian quantile regression. J Stat Comput Simul 81(11):1565–1578Google Scholar
  12. Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the T-distribution. J Am Stat Assoc 84:881–896Google Scholar
  13. Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, Schön CC (2013) Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol 12(3):375–391.  https://doi.org/10.1515/sagmb-2012-0042 Google Scholar
  14. Li Z, Möttönen J, Sillanpää MJ (2015) A robust multiple-locus method for quantitative trait locus analysis of non-normally distributed multiple traits. Heredity 115(6):556–564Google Scholar
  15. Lourenço VM, Pires AM (2014) M-regression, false discovery rates and outlier detection with application to genetic association studies. J Comput Stat Data Anal 78:33–42Google Scholar
  16. Lourenço VM, Pires AM, Kirst M (2011) Robust linear regression methods in association studies. Bioinformatics 27(6):815–821Google Scholar
  17. Lourenço VM, Rodrigues PC, Pires AM, Piepho H-P (2017) A robust DF-REML framework for variance components estimation in genetic studies. Bioinformatics 33(22):3584–3594Google Scholar
  18. Montesinos-López OA, Montesinos-López A, Crossa J, Toledo F, Pérez-Hernández O, Eskridge KM, Rutkoski J (2016) A genomic bayesian multi-trait and multi-environment model. G3: Genes|Genomes|Genetics 6(9):2725–2744Google Scholar
  19. Nascimento M, de Resende MD, Cruz CD, Nascimento AC, Viana JM, Azevedo CF, Barroso LM (2017) Regularized quantile regression applied to genome-enabled prediction of quantitative traits. Genet Mol Res.  https://doi.org/10.4238/gmr16019538 Google Scholar
  20. Ould-Estaghvirou SB, Ogutu JO, Piepho HP (2014) Influence of outliers on accuracy estimation in genomic prediction in plant breeding. G3: Genes, Genomes, Genetics 4(12):2317–2328Google Scholar
  21. Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103(482):681–686Google Scholar
  22. Pérez P, de los Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R. Plant Genome 3:106–116Google Scholar
  23. Pérez-Rodríguez P, de los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495Google Scholar
  24. Pourhoseingholi A, Pourhoseingholi MA, Vahedi M, Moghimi-Dehkordi B, Maserat AS, Zali MR (2009) Relation between demographic factors and hospitalization in patients with gastrointestinal disorders, using quantile regression analysis. East Afr J Public Health 6(1):45–47Google Scholar
  25. Rodrigues PC, Monteiro A, Lourenço VM (2016) A robust AMMI model for the analysis of genotype-by-environment data. Bioinformatics 32(1):58–66Google Scholar
  26. Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880Google Scholar
  27. Seber GAF, Lee AJ (2003) Linear regression analysis, 2nd edn. Wiley, HobokenGoogle Scholar
  28. Strandén I, Gianola D (1998) Attenuating effects of preferential treatment with Student-t mixed linear models: a simulation study. Genet Sel Evol 30:565–583Google Scholar
  29. Strandén I, Gianola D (1999) Mixed effects linear models with t-distributions for quantitative genetic analysis: a Bayesian approach. Genet Sel Evol 31:25–42.  https://doi.org/10.1186/1297-9686-31-1-25 Google Scholar
  30. VanRaden PM (2007) Genomic measures of relationship and inbreeding. Interbull Bull 37:33–36Google Scholar
  31. Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656Google Scholar
  32. Yu K, Moyeed A (2001) Bayesian quantile regression. Stat Probab Lett 54:437–447Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI)Universidad de GuadalajaraGuadalajaraMexico
  2. 2.Facultad de TelemáticaUniversidad de ColimaColimaMexico
  3. 3.Departamento de EstadísticaCentro de Investigación en Matemáticas (CIMAT)GuanajuatoMexico
  4. 4.Departments of Animal Sciences, Dairy Science, and Biostatistics and Medical InformaticsUniversity of Wisconsin-MadisonMadisonUSA
  5. 5.Biometrics and Statistics Unit and Global Wheat ProgramInternational Maize and Wheat Improvement Center (CIMMYT)MexicoMexico

Personalised recommendations