Application of support vector regression to genome-assisted prediction of quantitative traits

Long, Nanye; Gianola, Daniel; Rosa, Guilherme J. M.; Weigel, Kent A.

doi:10.1007/s00122-011-1648-y

Application of support vector regression to genome-assisted prediction of quantitative traits

Original Paper
Published: 08 July 2011

Volume 123, pages 1065–1074, (2011)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Nanye Long¹,
Daniel Gianola^1,2,3,
Guilherme J. M. Rosa^1,3 &
…
Kent A. Weigel²

1106 Accesses
51 Citations
Explore all metrics

Abstract

A byproduct of genome-wide association studies is the possibility of carrying out genome-enabled prediction of disease risk or of quantitative traits. This study is concerned with predicting two quantitative traits, milk yield in dairy cattle and grain yield in wheat, using dense molecular markers as predictors. Two support vector regression (SVR) models, ε-SVR and least-squares SVR, were explored and compared to a widely applied linear regression model, the Bayesian Lasso, the latter assuming additive marker effects. Predictive performance was measured using predictive correlation and mean squared error of prediction. Depending on the kernel function chosen, SVR can model either linear or nonlinear relationships between phenotypes and marker genotypes. For milk yield, where phenotypes were estimated breeding values of bulls (a linear combination of the data), SVR with a Gaussian radial basis function (RBF) kernel had a slightly better performance than with a linear kernel, and was similar to the Bayesian Lasso. For the wheat data, where phenotype was raw grain yield, the RBF kernel provided clear advantages over the linear kernel, e.g., a 17.5% increase in correlation when using the ε-SVR. SVR with a RBF kernel also compared favorably to the Bayesian Lasso in this case. It is concluded that a nonlinear RBF kernel may be an optimal choice for SVR, especially when phenotypes to be predicted have a nonlinear dependency on genotypes, as it might have been the case in the wheat data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation

Article 06 November 2019

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

Article Open access 23 October 2019

Predictive ability of genome-assisted statistical models under various forms of gene action

Article Open access 17 August 2018

References

Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Ratsch G (2008) Support vector machines and kernels for computational biology. PLoS Comput Biol 4(10):e1000173
Article PubMed Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Article Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cherkassky V, Ma Y (2004a) Comparison of loss functions for linear regression. In: Proceedings of the International Joint Conference on Neural Network
Cherkassky V, Ma Y (2004b) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126
Article PubMed Google Scholar
Cherkassky VS, Mulier F (2007) Learning from data: concepts, theory, and methods, 2nd edn. Wiley, Hoboken
Book Google Scholar
Coen T, Saeys W, Ramon H, Baerdemaeker JD (2006) Optimizing the tuning parameters of least squares support vector machines regression for NIR spectra. J Chemometr 20:184–192
Article CAS Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, New York
Google Scholar
Crossa J, Burgueño J, Dreisigacker S, Vargas M, Herrera-Foessel SA, Lillemo M, Singh RP, Trethowan R, Warburton M, Franco J, Reynolds M, Crouch JH, Ortiz R (2007) Association analysis of historical bread wheat germplasm using additive genetic covariance of relatives and population structure. Genetics 177(3):1889–1913
Article PubMed CAS Google Scholar
Crossa J, de los Campos G, Perez P, Gianola D, Burgueno J, Araus JL, Makumbi D, Singh R, Dreisigacker S, Yan J, Arief V, Banziger M, Braun H-J (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724
Article PubMed CAS Google Scholar
de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel KA, Cotes J (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigrees. Genetics 182(1):375–385
Article Google Scholar
de los Campos G, Gianola D, Allison DB (2010) Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 11:880–886
Article Google Scholar
Gianola D, van Kaam J (2008) Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics 178(4):2289–2303
Article PubMed Google Scholar
Gianola D, Fernando R, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173(3):1761–1776
Article PubMed CAS Google Scholar
González-Recio O, Gianola D, Rosa G, Weigel K, Kranis A (2009) Genome-assisted prediction of a quantitative trait measured in parents and progeny: application to food conversion rate in chickens. Genet Sel Evol 41(1):3
Article PubMed Google Scholar
Goodman M, Stuber C (1983) Races of maize: VI. Isozyme variation among races of maize in Bolivia. Maydica 28:169–187
Google Scholar
Long N, Gianola D, Rosa GJM, Weigel KA, Kranis A, González-Recio O (2010) Radial basis function regression methods for predicting quantitative traits using SNP markers. Genet Res 92(3):209–225
Article CAS Google Scholar
Maccaferri M, Sanguineti MC, Corneti S, Ortega JLA, Salem MB, Bort J, DeAmbrogio E, del Moral LFG, Demontis A, El-Ahmed A, Maalouf F, Machlab H, Martos V, Moragues M, Motawaj J, Nachit M, Nserallah N, Ouabbou H, Royo C, Slama A, Tuberosa R (2008) Quantitative trait loci for grain yield and adaptation of durum wheat (Triticum durum Desf.) across a wide range of water availability. Genetics 178(1):489–511
Article PubMed Google Scholar
Maenhout S, Baets BD, Haesaert G, Bockstaele EV (2007) Support vector machine regression for the prediction of maize hybrid performance. Theor Appl Genet 115:1003–1013
Article PubMed CAS Google Scholar
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
PubMed CAS Google Scholar
Moser G, Tier B, Crump RE, Khatkar MS, Raadsma HW (2009) A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genet Sel Evol 41(1):56
Article PubMed Google Scholar
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Book Google Scholar
Park T, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686
Article CAS Google Scholar
Pelckmans K, Suykens JAK, Gestel TV, Brabanter JD, Lukas L, Hamers B, Moor BD, Vandewalle J (2007) LS-SVMlab: a MATLAB/C toolbox for least squares support vector machines. Software available at http://www.esat.kuleuven.be/sista/lssvmlab/
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, New York
Book Google Scholar
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222
Article Google Scholar
Suykens J, Gestel TV, Brabanter JD, Moor BD, Vandewalle J (2002) Leaset squares support vector machines. World Scientific, Singapore
Book Google Scholar
Vapnik V (1995) The nature of statistical learning theory, 2nd edn. Springer, New York
Google Scholar
Vázquez AI, Rosa GJM, Weigel KA, de los Campos G, Gianola D, Allison DB (2010) Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci 93:5942–5949
Article PubMed Google Scholar
Visscher PM (2008) Sizing up human height variation. Nat Genet 40(5):489–490
Article PubMed CAS Google Scholar
Watkins C (2000) Dynamic alignment kernels. In: Smola AJ, Bartlett PL, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge
Google Scholar
Wei Z, Wang K, Qu H-Q, Zhang H, Bradfield J, Kim C, Frackleton E, Hou C, Glessner JT, Chiavacci R, Stanley C, Monos D, Grant SFA, Polychronakos C, Hakonarson H (2009) From disease association to risk assessment: an optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genet 5(10):e1000678
Article PubMed Google Scholar
Weigel KA, de los Campos G, González-Recio O, Naya H, Wu XL, Long N, Rosa GJM, Gianola D (2009) Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J Dairy Sci 92(10):5248–5257
Article PubMed CAS Google Scholar
Wright S (1978) Variability within and among natural populations. In: Evolution and the genetics of populations
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Article PubMed CAS Google Scholar
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179(2):1045–1055
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This work was supported by the Wisconsin Agriculture Experiment Station, Aviagen Ltd., and by grants NRICGP/USDA 2003-35205-12833, NSF DEB-0089742 and NSF DMS-044371. We thank the editor and reviewers for their insightful comments.

Author information

Authors and Affiliations

Department of Animal Sciences, University of Wisconsin, 1675 Observatory Dr., Animal Science Bldg, Madison, WI, 53706, USA
Nanye Long, Daniel Gianola & Guilherme J. M. Rosa
Department of Dairy Science, University of Wisconsin, Madison, WI, 53706, USA
Daniel Gianola & Kent A. Weigel
Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, 53706, USA
Daniel Gianola & Guilherme J. M. Rosa

Authors

Nanye Long
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gianola
View author publications
You can also search for this author in PubMed Google Scholar
Guilherme J. M. Rosa
View author publications
You can also search for this author in PubMed Google Scholar
Kent A. Weigel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nanye Long.

Additional information

Communicated by C. Schön.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, N., Gianola, D., Rosa, G.J.M. et al. Application of support vector regression to genome-assisted prediction of quantitative traits. Theor Appl Genet 123, 1065–1074 (2011). https://doi.org/10.1007/s00122-011-1648-y

Download citation

Received: 14 March 2011
Accepted: 22 June 2011
Published: 08 July 2011
Issue Date: November 2011
DOI: https://doi.org/10.1007/s00122-011-1648-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application of support vector regression to genome-assisted prediction of quantitative traits

Abstract

Access this article

Similar content being viewed by others

Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

Predictive ability of genome-assisted statistical models under various forms of gene action

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of support vector regression to genome-assisted prediction of quantitative traits

Abstract

Access this article

Similar content being viewed by others

Comparison of parametric, semiparametric and nonparametric methods in genomic evaluation

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

Predictive ability of genome-assisted statistical models under various forms of gene action

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation