Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection

Li, Zitong; Sillanpää, Mikko J.

doi:10.1007/s00122-012-1892-9

Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection

Review
Published: 24 May 2012

Volume 125, pages 419–435, (2012)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Zitong Li¹ &
Mikko J. Sillanpää^1,2,3,4,5

3392 Accesses
120 Citations
3 Altmetric
Explore all metrics

Abstract

Quantitative trait loci (QTL)/association mapping aims at finding genomic loci associated with the phenotypes, whereas genomic selection focuses on breeding value prediction based on genomic data. Variable selection is a key to both of these tasks as it allows to (1) detect clear mapping signals of QTL activity, and (2) predict the genome-enhanced breeding values accurately. In this paper, we provide an overview of a statistical method called least absolute shrinkage and selection operator (LASSO) and two of its generalizations named elastic net and adaptive LASSO in the contexts of QTL mapping and genomic breeding value prediction in plants (or animals). We also briefly summarize the Bayesian interpretation of LASSO, and the inspired hierarchical Bayesian models. We illustrate the implementation and examine the performance of methods using three public data sets: (1) North American barley data with 127 individuals and 145 markers, (2) a simulated QTLMAS XII data with 5,865 individuals and 6,000 markers for both QTL mapping and genomic selection, and (3) a wheat data with 599 individuals and 1,279 markers only for genomic selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genomic basis determining root system architecture in maize

Article 12 April 2024

KASP: a high-throughput genotyping system and its applications in major crop plants for biotic and abiotic stress tolerance

Article 15 April 2024

Understanding the Concept of Speed Breeding in Crop Improvement: Opportunities and Challenges Towards Global Food Security

Article 03 February 2024

References

Akaike H (1974) New look at the statistical model identification. IEEE T Autom Contr 19:716–723
Article Google Scholar
Alexander DH, Lange K (2011) Stability selection for genome-wide association. Genet Epidemiol 35:722–728
Article PubMed Google Scholar
Ayers KL, Cordell HJ (2010) SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet Epidemiol 34:879–891
Article PubMed Google Scholar
Bernardo R, Yu J (2007) Prospects for genomewide selection for quantitative traits in maize. Crop Sci 47:1082–1090
Article Google Scholar
Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc B 64:641–656
Article Google Scholar
Bühlmann P, Meier L (2008) Discussion of “One-step sparse estimates in nonconcave penalized likelihood models” (authors Zou H and Li R). Ann Stat 36:1534–1541
Article Google Scholar
Bühlmann P, van de Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer, New York
Book Google Scholar
Burgueño J, DeLos Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci 52:707–719
Article Google Scholar
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–771
Article Google Scholar
Chen J, Cui W (2010) A two-phase procedure for QTL mapping with regression models. Theor Appl Genet 121:363–372
Article PubMed Google Scholar
Cho S, Kim K, Kim YJ, Lee JK, Cho YS, Lee JY, Han BG, Kim H, Ott J, Park T (2010) Joint identification of multiple genetic variants via elastic-net variable selection in a genome-wide association analysis. Ann Hum Genet 74:416–428
Article PubMed Google Scholar
Clark SA, Hickey JM, van der Werf JHJ (2011) Different models of genetic variation and their effect on genomic evaluation. Genet Sel Evol 43:18
Article PubMed Google Scholar
Crooks L, Sahana G, De Koning DJ, Lund MS, Carlborg Ö (2009) Comparison of analyses of the QTLMAS XII common dataset. II: genome-wide association and fine mapping. BMC Proc 3:S2
Article PubMed Google Scholar
Crossa J, DeLos Campos G, Pérez P, Gianola D, Burgueño J, Araus JL, Makumbi D, Singh RP, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724
Article PubMed CAS Google Scholar
Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031
Article PubMed CAS Google Scholar
Dekkers JCM (2010) Use of high-density marker genotype for genetic improvement of livestock by genomic selection. CAB Reviews 5
Dekkers JCM, Hospital F (2002) The use of molecular genetics in the improvement of agricultural populations. Nat Rev Genet 3:22–32
Article PubMed CAS Google Scholar
DeLos Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385
Article CAS Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–451
Article Google Scholar
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP . Plant Genome 4:250–255
Article Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article Google Scholar
Figueiredo MAT (2003) Adaptive sparseness for supervised learning. IEEE Trans Pattern Anal Mach Intell 25:1150–1159
Article Google Scholar
Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Article Google Scholar
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1
PubMed Google Scholar
Goddard ME, Hayes BJ (2007) Genomic selection. J Anim Breed Genet 124:323–330
Article PubMed CAS Google Scholar
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397
PubMed CAS Google Scholar
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324
Article PubMed CAS Google Scholar
Harris BL, Johnson DL (2010) SNP selection using Elastic net, with application to genomic selection. In 9th World Congress on Genetics Applied to Livestock Production, Leipzig, Germany. http://www.kongressband.de/wcgalp2010/assets/pdf/0282.pdf
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning. Springer, New York
Book Google Scholar
Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12
Article CAS Google Scholar
Hesterberg T, Choi NH, Meier L, Fraley C (2008) Least angle and $\ell_1$ penalized regression: a review. Stat Surv 2:61–93
Article Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Google Scholar
Huang J, Ma S, Zhang CH (2008) Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618
Google Scholar
Jannink JL, Bink MCAM, Jansen RC (2001) Using complex plant pedigrees to map valuable genes. Trends Plant Sci 6:337–342
Article PubMed CAS Google Scholar
Kyung M, Gill J, Ghosh M, Casella G (2010) Penalized regression, standard errors, and Bayesian Lassos. Bayesian Anal 2:369–412
Google Scholar
Legarra A, Robert-Granié C, Croiseau P, Guillaume F, Fritz S (2011) Improved Lasso for genomic selection. Genet Res 93:77–87
Article CAS Google Scholar
Li Q, Lin N (2010) The Bayesian elastic net. Bayesian Anal 5:151–170
Article Google Scholar
Li Z, Sillanpää MJ (2012) Estimation of quantitative trait locus effects with epistasis by variational Bayes algorithms. Genetics 190:231–249
Article PubMed Google Scholar
Li J, Das K, Fu G, Li R, Wu R (2011) The Bayesian LASSO for genome-wide association studies. Bioinformatics 27:516–523
Article PubMed CAS Google Scholar
Lund MS, Sahana G, De Koning DJ, Su G, Carlborg Ö (2009) Comparison of analyses of the QTLMAS XII common dataset. I: Genomic selection. BMC Proc 3:S1
Article PubMed Google Scholar
Meinshausen N (2007) Relaxed LASSO. Comput Stat Data An 52:374–393
Article Google Scholar
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the lasso. Ann Stat 34:1436–1462
Article Google Scholar
Meinshausen N, Bühlmann P (2010) Stability selection. J Roy Stat Soc B 72:417–473
Article Google Scholar
Meinshausen N, Meier L, Bühlmann P (2009) P-values for high-dimensional regression. J Am Stat Assoc 104:1671–1681
Article Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
PubMed CAS Google Scholar
Mutshinda CM, Sillanpää MJ (2010) Extended Bayesian LASSO for multiple quantitative trait loci mapping and unobserved phenotype prediction. Genetics 186:1067–1075
Article PubMed Google Scholar
Osborne M, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA J Numer Anal 20:389–404
Article Google Scholar
Park T, Casella G (2008) The Bayessian LASSO. J Am Stat Assoc 103:681–686
Article CAS Google Scholar
Patterson HD, Thompson R (1971) Recovery of inter-block information with block sizes are unequal. Biometrika 58:545–554
Article Google Scholar
Pérez P, DeLos Campos G, Crossa J, Gianola D (2010) Genomic-enabled prediction based on molecular markers and pedigree using the BLR package in R. Plant Genome 3:106–116
Article PubMed Google Scholar
Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49:1165–1176
Article Google Scholar
Piepho HP, Ogutu JO, Schulz-Streeck T, Estaghvirou B, Gordillo A, Technow F (2012) Efficient computation of ridge-regression BLUP in genomic selection in plant breeding. Crop Sci 52:1093–1104
Article Google Scholar
Shepherd RK, Meuwissen THE, Woolliams JA (2010) Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers. BMC Bioinforma 11:529
Article Google Scholar
Siegmund D, Yakir B (2007) The statistics of gene mapping. Springer, Berlin
Google Scholar
Sillanpää MJ (2011) Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity 106:511–519
Article PubMed Google Scholar
Sillanpää MJ, Corander J (2002) Model choice in gene mapping: what and why. Trends Genet 18:301–307
Article PubMed Google Scholar
Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39:5
Google Scholar
Solberg TR, Sonesson AK, Woolliams JA, Ødegard J, Meuwissen THE (2009) Persistence of accuracy of genome-wide breeding values over generations when including a polygenic effect. Genet Sel Evol 41:53
Article PubMed Google Scholar
Sun W, Ibrahim JG, Zou F (2010) Genomewide multiple-loci mapping in experimental crosses by iterative adaptive penalized regression. Genetics 185:349–359
Article PubMed CAS Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58:267–288
Google Scholar
Tinker NA, Mather DE, Rossnagel BG, Kasha KJ, Kleinhofs A et al (1996) Regions of the genome that affect agronomic performance in two-row barley. Crop Sci 36:1053–1062
Article Google Scholar
Usai MG, Goddard ME, Hayes BJ (2009) LASSO with cross-validation for genomic selection. Genet Res 91:427–436
Article CAS Google Scholar
Valdar W, Solberg LC, Gauguier D, Cookson WO, Rawlins JNP, Mott R, Flint J (2006) Genetic and environmental effects on complex traits in mice. Genetics 174:959–984
Article PubMed CAS Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Article PubMed CAS Google Scholar
Wang D, Eskridge KM, Crossa J (2010) Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. J Agric Biol Envir S 16:170–184
Article Google Scholar
Wasserman L, Roeder K (2009) High dimensional variable selection. Ann Stat 37:2178–2201
Article PubMed Google Scholar
Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25:714–721
Article PubMed CAS Google Scholar
Xu S (2003) Estimating polygenic effects using markers of the entire genome. Genetics 163:789–801
PubMed CAS Google Scholar
Xu S (2007) An empirical Bayes method for estimating epistatic effects of quantitative trait loci. Biometrics 63:513–521
Article PubMed CAS Google Scholar
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179:1045–1055
Article PubMed CAS Google Scholar
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208
Article PubMed CAS Google Scholar
Zhao P, Yu B (2006) On model selection consistency of LASSO. J Mach Learn Res 7:2541–2563
Google Scholar
Zhou S (2010) Thresholded Lasso for high dimensional variable selection and statistical estimation. arXiv:1002.1583v2
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article CAS Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67:301–320
Article Google Scholar
Zou H, Hastie T (2008) Model building and feature selection with genomic data. In Liu H and Motoda H, editors, Computational Methods of Feature Selection, chapter 20, pp 393–411. Chapman & Hall, London
Zou H, Zhang H (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37:1733–1751
Article PubMed Google Scholar
Zou H, Hastie T, Tibshirani R (2007) On the “degrees of freedom” of the lasso. Ann Stat 35:2173–2192
Article Google Scholar

Download references

Acknowledgments

We thank Daniel Blande, Mahlako Makgahlela and Crispin Mutshinda for giving constructive suggestions on the manuscript. We are also grateful to two anonymous referees for their valuable comments. This work was supported by the Finnish Graduate School of Population Genetics, and by research grants from the Academy of Finland and the University of Helsinki’s Research Funds.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Helsinki, PO Box 68, 00014, Helsinki, Finland
Zitong Li & Mikko J. Sillanpää
Department of Mathematical Sciences, University of Oulu, PO Box 3000, 90014, Oulu, Finland
Mikko J. Sillanpää
Department of Biology, University of Oulu, PO Box 3000, 90014, Oulu, Finland
Mikko J. Sillanpää
Biocenter Oulu, Oulu, Finland
Mikko J. Sillanpää
Department of Agricultural Sciences, University of Helsinki, Helsinki, Finland
Mikko J. Sillanpää

Authors

Zitong Li
View author publications
You can also search for this author in PubMed Google Scholar
Mikko J. Sillanpää
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mikko J. Sillanpää.

Additional information

Communicated by R. Varshney.

Appendix: Coordinate descent algorithm

Initially, the marker data are assumed to be standardized and phenotype data to be centered so that $\frac{1}{n}\sum\nolimits_{{i = 1}}^{n} {x_{{ij}} } = 0, $ $\sum\nolimits^n_{i=1} x^2_{ij}=1$ for j = 1, …, p, and $\frac{1}{n}\sum\nolimits^n_{i=1} y_i=0. $ The Elastic net problem (LASSO: α = 1, Ridge regression: α = 0) can be specified as

$$ \hat{\varvec{\beta}}=\arg \min_{\varvec{\beta}}\left\{\frac{1}{2N}\sum^n_{i=1}\left(y_i-\sum^{p}_{j=1}x_{ij}\beta_j\right)^2 +\lambda[(1-\alpha)\frac{1}{2}\sum^{p}_{j=1}\beta^2_j+\alpha\sum^{p}_{j=1}|\beta_j|]\right\}. $$

(16)

The principle of the coordinate descent is that when minimizing the Elastic net target function, the algorithm updates each component β_j successively in the direction giving the largest decrease of the objective function by fixing all other components. Assuming the current estimate of β_j is β ⁽⁰⁾_j , and we have already updated the estimate of $\beta_1, \beta_2, \ldots, \beta_{j-1}$ as $\beta^{(1)}_1, \beta^{(1)}_2, \ldots, \beta^{(1)}_{j-1}, $ the estimate of β ⁽¹⁾_j can be updated as

$$ \beta^{(1)}_j(\lambda)=\frac{S(\beta^{(0)}_j+\frac{1}{N}\sum^n_{i=1}x_{ij}r_i,\lambda\alpha)}{1+\lambda(1-\alpha)} , $$

(17)

where S(a, b) is the thresholding function defined as

$$ S(a,b) = \hbox {sign}(a)\cdot\max(|a|-b,0), $$

(18)

and $r_i=y_i-\sum^{p}_{j}x_{ij}{\beta_j}$ for $i=1,\ldots,n$ is the residual, which should be updated as $r_i=r_i+x_{ij}(\beta^{(0)}_j-\beta^{(1)}_j)$ when $\beta^{(1)}_{j}$ is ready. The algorithm updates each component of ${\varvec{\beta}}$ in a cyclic manner as $1, 2, \ldots, p, 1, 2, \ldots, p, \ldots , $ until the solutions converge.

The coordinate descent algorithm can be used for Adaptive LASSO as well. In each iteration, we use the update function:

$$ \beta^{(1)}_j(\lambda)=S\left(\beta^{(0)}_j+\frac{1}{N}\sum^n_{i=1} x_{ij}r_i,\frac{\lambda}{|\hat{\beta}_{{\rm init},j}|}\right) , $$

(19)

where $\hat{\beta}_{{\rm init},j}$ are certain initial estimates, for example, from OLS or standard LASSO.

For more information, see Friedman et al. (2007) and (2010).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Sillanpää, M.J. Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125, 419–435 (2012). https://doi.org/10.1007/s00122-012-1892-9

Download citation

Received: 14 November 2011
Accepted: 27 April 2012
Published: 24 May 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s00122-012-1892-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection

Abstract

Access this article

Similar content being viewed by others

Genomic basis determining root system architecture in maize

KASP: a high-throughput genotyping system and its applications in major crop plants for biotic and abiotic stress tolerance

Understanding the Concept of Speed Breeding in Crop Improvement: Opportunities and Challenges Towards Global Food Security

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Coordinate descent algorithm

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection

Abstract

Access this article

Similar content being viewed by others

Genomic basis determining root system architecture in maize

KASP: a high-throughput genotyping system and its applications in major crop plants for biotic and abiotic stress tolerance

Understanding the Concept of Speed Breeding in Crop Improvement: Opportunities and Challenges Towards Global Food Security

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Coordinate descent algorithm

Appendix: Coordinate descent algorithm

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation