Theoretical and Applied Genetics

, Volume 125, Issue 3, pp 419–435 | Cite as

Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection

Review

Abstract

Quantitative trait loci (QTL)/association mapping aims at finding genomic loci associated with the phenotypes, whereas genomic selection focuses on breeding value prediction based on genomic data. Variable selection is a key to both of these tasks as it allows to (1) detect clear mapping signals of QTL activity, and (2) predict the genome-enhanced breeding values accurately. In this paper, we provide an overview of a statistical method called least absolute shrinkage and selection operator (LASSO) and two of its generalizations named elastic net and adaptive LASSO in the contexts of QTL mapping and genomic breeding value prediction in plants (or animals). We also briefly summarize the Bayesian interpretation of LASSO, and the inspired hierarchical Bayesian models. We illustrate the implementation and examine the performance of methods using three public data sets: (1) North American barley data with 127 individuals and 145 markers, (2) a simulated QTLMAS XII data with 5,865 individuals and 6,000 markers for both QTL mapping and genomic selection, and (3) a wheat data with 599 individuals and 1,279 markers only for genomic selection.

References

  1. Akaike H (1974) New look at the statistical model identification. IEEE T Autom Contr 19:716–723CrossRefGoogle Scholar
  2. Alexander DH, Lange K (2011) Stability selection for genome-wide association. Genet Epidemiol 35:722–728PubMedCrossRefGoogle Scholar
  3. Ayers KL, Cordell HJ (2010) SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet Epidemiol 34:879–891PubMedCrossRefGoogle Scholar
  4. Bernardo R, Yu J (2007) Prospects for genomewide selection for quantitative traits in maize. Crop Sci 47:1082–1090CrossRefGoogle Scholar
  5. Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc B 64:641–656CrossRefGoogle Scholar
  6. Bühlmann P, Meier L (2008) Discussion of “One-step sparse estimates in nonconcave penalized likelihood models” (authors Zou H and Li R). Ann Stat 36:1534–1541CrossRefGoogle Scholar
  7. Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New YorkCrossRefGoogle Scholar
  8. Burgueño J, DeLos Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci 52:707–719CrossRefGoogle Scholar
  9. Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–771CrossRefGoogle Scholar
  10. Chen J, Cui W (2010) A two-phase procedure for QTL mapping with regression models. Theor Appl Genet 121:363–372PubMedCrossRefGoogle Scholar
  11. Cho S, Kim K, Kim YJ, Lee JK, Cho YS, Lee JY, Han BG, Kim H, Ott J, Park T (2010) Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis. Ann Hum Genet 74:416–428PubMedCrossRefGoogle Scholar
  12. Clark SA, Hickey JM, van der Werf JHJ (2011) Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 43:18PubMedCrossRefGoogle Scholar
  13. Crooks L, Sahana G, De Koning DJ, Lund MS, Carlborg Ö (2009) Comparison of analyses of the QTLMAS XII common dataset. II: genome-wide association and fine mapping. BMC Proc 3:S2PubMedCrossRefGoogle Scholar
  14. Crossa J, DeLos Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724PubMedCrossRefGoogle Scholar
  15. Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031PubMedCrossRefGoogle Scholar
  16. Dekkers JCM (2010) Use of high-density marker genotype for genetic improvement of livestock by genomic selection. CAB Reviews 5Google Scholar
  17. Dekkers JCM, Hospital F (2002) The use of molecular genetics in the improvement of agricultural populations. Nat Rev Genet 3:22–32PubMedCrossRefGoogle Scholar
  18. DeLos Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385CrossRefGoogle Scholar
  19. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–451CrossRefGoogle Scholar
  20. Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255CrossRefGoogle Scholar
  21. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360CrossRefGoogle Scholar
  22. Figueiredo MAT (2003) Adaptive sparseness for supervised learning. IEEE Trans Pattern Anal Mach Intell 25:1150–1159CrossRefGoogle Scholar
  23. Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332CrossRefGoogle Scholar
  24. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1PubMedGoogle Scholar
  25. Goddard ME, Hayes BJ (2007) Genomic selection. J Anim Breed Genet 124:323–330PubMedCrossRefGoogle Scholar
  26. Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397PubMedGoogle Scholar
  27. Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324PubMedCrossRefGoogle Scholar
  28. Harris BL, Johnson DL (2010) SNP selection using Elastic net, with application to genomic selection. In 9th World Congress on Genetics Applied to Livestock Production, Leipzig, Germany. http://www.kongressband.de/wcgalp2010/assets/pdf/0282.pdf
  29. Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning. Springer, New YorkCrossRefGoogle Scholar
  30. Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12CrossRefGoogle Scholar
  31. Hesterberg T, Choi NH, Meier L, Fraley C (2008) Least angle and \(\ell_1\) penalized regression: a review. Stat Surv 2:61–93CrossRefGoogle Scholar
  32. Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67Google Scholar
  33. Huang J, Ma S, Zhang CH (2008) Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618Google Scholar
  34. Jannink JL, Bink MCAM, Jansen RC (2001) Using complex plant pedigrees to map valuable genes. Trends Plant Sci 6:337–342PubMedCrossRefGoogle Scholar
  35. Kyung M, Gill J, Ghosh M, Casella G (2010) Penalized regression, standard errors, and Bayesian Lassos. Bayesian Anal 2:369–412Google Scholar
  36. Legarra A, Robert-Granié C, Croiseau P, Guillaume F, Fritz S (2011) Improved Lasso for genomic selection. Genet Res 93:77–87CrossRefGoogle Scholar
  37. Li Q, Lin N (2010) The Bayesian elastic net. Bayesian Anal 5:151–170CrossRefGoogle Scholar
  38. Li Z, Sillanpää MJ (2012) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190:231–249PubMedCrossRefGoogle Scholar
  39. Li J, Das K, Fu G, Li R, Wu R (2011) The Bayesian LASSO for genome-wide association studies. Bioinformatics 27:516–523PubMedCrossRefGoogle Scholar
  40. Lund MS, Sahana G, De Koning DJ, Su G, Carlborg Ö (2009) Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection. BMC Proc 3:S1PubMedCrossRefGoogle Scholar
  41. Meinshausen N (2007) Relaxed LASSO. Comput Stat Data An 52:374–393CrossRefGoogle Scholar
  42. Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462CrossRefGoogle Scholar
  43. Meinshausen N, Bühlmann P (2010) Stability selection. J Roy Stat Soc B 72:417–473CrossRefGoogle Scholar
  44. Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104:1671–1681CrossRefGoogle Scholar
  45. Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedGoogle Scholar
  46. Mutshinda CM, Sillanpää MJ (2010) Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction. Genetics 186:1067–1075PubMedCrossRefGoogle Scholar
  47. Osborne M, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA J Numer Anal 20:389–404CrossRefGoogle Scholar
  48. Park T, Casella G (2008) The Bayessian LASSO. J Am Stat Assoc 103:681–686CrossRefGoogle Scholar
  49. Patterson HD, Thompson R (1971) Recovery of inter-block information with block sizes are unequal. Biometrika 58:545–554CrossRefGoogle Scholar
  50. Pérez P, DeLos Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R. Plant Genome 3:106–116PubMedCrossRefGoogle Scholar
  51. Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49:1165–1176CrossRefGoogle Scholar
  52. Piepho HP, Ogutu JO, Schulz-Streeck T, Estaghvirou B, Gordillo A, Technow F (2012) Efficient computation of ridge-regression BLUP in genomic selection in plant breeding. Crop Sci 52:1093–1104CrossRefGoogle Scholar
  53. Shepherd RK, Meuwissen THE, Woolliams JA (2010) Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers. BMC Bioinforma 11:529CrossRefGoogle Scholar
  54. Siegmund D, Yakir B (2007) The statistics of gene mapping. Springer, BerlinGoogle Scholar
  55. Sillanpää MJ (2011) Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity 106:511–519PubMedCrossRefGoogle Scholar
  56. Sillanpää MJ, Corander J (2002) Model choice in gene mapping: what and why. Trends Genet 18:301–307PubMedCrossRefGoogle Scholar
  57. Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39:5Google Scholar
  58. Solberg TR, Sonesson AK, Woolliams JA, Ødegard J, Meuwissen THE (2009) Persistence of accuracy of genome-wide breeding values over generations when including a polygenic effect. Genet Sel Evol 41:53PubMedCrossRefGoogle Scholar
  59. Sun W, Ibrahim JG, Zou F (2010) Genomewide multiple-loci mapping in experimental crosses by iterative adaptive penalized regression. Genetics 185:349–359PubMedCrossRefGoogle Scholar
  60. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288Google Scholar
  61. Tinker NA, Mather DE, Rossnagel BG, Kasha KJ, Kleinhofs A et al (1996) Regions of the genome that affect agronomic performance in two-row barley. Crop Sci 36:1053–1062CrossRefGoogle Scholar
  62. Usai MG, Goddard ME, Hayes BJ (2009) LASSO with cross-validation for genomic selection. Genet Res 91:427–436CrossRefGoogle Scholar
  63. Valdar W, Solberg LC, Gauguier D, Cookson WO, Rawlins JNP, Mott R, Flint J (2006) Genetic and environmental effects on complex traits in mice. Genetics 174:959–984PubMedCrossRefGoogle Scholar
  64. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423PubMedCrossRefGoogle Scholar
  65. Wang D, Eskridge KM, Crossa J (2010) Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. J Agric Biol Envir S 16:170–184CrossRefGoogle Scholar
  66. Wasserman L, Roeder K (2009) High dimensional variable selection. Ann Stat 37:2178–2201PubMedCrossRefGoogle Scholar
  67. Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25:714–721PubMedCrossRefGoogle Scholar
  68. Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801PubMedGoogle Scholar
  69. Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63:513–521PubMedCrossRefGoogle Scholar
  70. Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055PubMedCrossRefGoogle Scholar
  71. Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208PubMedCrossRefGoogle Scholar
  72. Zhao P, Yu B (2006) On model selection consistency of LASSO. J Mach Learn Res 7:2541–2563Google Scholar
  73. Zhou S (2010) Thresholded Lasso for high dimensional variable selection and statistical estimation. arXiv:1002.1583v2Google Scholar
  74. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429CrossRefGoogle Scholar
  75. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67:301–320CrossRefGoogle Scholar
  76. Zou H, Hastie T (2008) Model building and feature selection with genomic data. In Liu H and Motoda H, editors, Computational Methods of Feature Selection, chapter 20, pp 393–411. Chapman & Hall, LondonGoogle Scholar
  77. Zou H, Zhang H (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37:1733–1751PubMedCrossRefGoogle Scholar
  78. Zou H, Hastie T, Tibshirani R (2007) On the “degrees of freedom” of the lasso. Ann Stat 35:2173–2192CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of HelsinkiHelsinkiFinland
  2. 2.Department of Mathematical SciencesUniversity of OuluOuluFinland
  3. 3.Department of BiologyUniversity of OuluOuluFinland
  4. 4.Biocenter OuluOuluFinland
  5. 5.Department of Agricultural SciencesUniversity of HelsinkiHelsinkiFinland

Personalised recommendations