Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction

Maenhout, Steven; De Baets, Bernard; Haesaert, Geert

doi:10.1007/s00122-009-1200-5

Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction

Original Paper
Published: 11 November 2009

Volume 120, pages 415–427, (2010)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Steven Maenhout¹,
Bernard De Baets² &
Geert Haesaert¹

561 Accesses
25 Citations
Explore all metrics

Abstract

Accurate prediction of the phenotypic performance of a hybrid plant based on the molecular fingerprints of its parents should lead to a more cost-effective breeding programme as it allows to reduce the number of expensive field evaluations. The construction of a reliable prediction model requires a representative sample of hybrids for which both molecular and phenotypic information are accessible. This phenotypic information is usually readily available as typical breeding programmes test numerous new hybrids in multi-location field trials on a yearly basis. Earlier studies indicated that a linear mixed model analysis of this typically unbalanced phenotypic data allows to construct ɛ-insensitive support vector machine regression and best linear prediction models for predicting the performance of single-cross maize hybrids. We compare these prediction methods using different subsets of the phenotypic and marker data of a commercial maize breeding programme and evaluate the resulting prediction accuracies by means of a specifically designed field experiment. This balanced field trial allows to assess the reliability of the cross-validation prediction accuracies reported here and in earlier studies. The limits of the predictive capabilities of both prediction methods are further examined by reducing the number of training hybrids and the size of the molecular fingerprints. The results indicate a considerable discrepancy between prediction accuracies obtained by cross-validation procedures and those obtained by correlating the predictions with the results of a validation field trial. The prediction accuracy of best linear prediction was less sensitive to a reduction of the number of training examples compared with that of support vector machine regression. The latter was, however, better at predicting hybrid performance when the size of the molecular fingerprints was reduced, especially if the initial set of markers had a low information content.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian analysis and prediction of hybrid performance

Article Open access 07 February 2019

Identification of optimal prediction models using multi-omic data for selecting hybrid rice

Article 25 March 2019

Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids

References

Bernardo R (1993) Estimation of coefficient of coancestry using molecular markers in maize. Theor Appl Genet 85:1055–1062
Article CAS Google Scholar
Bernardo R (1994) Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci 34:20–25
Article Google Scholar
Bernardo R (1995) Genetic models for predicting maize single-cross performance in unbalanced yield trial data. Crop Sci 35:141–147
Article Google Scholar
Bernardo R (1996a) Best linear unbiased prediction of the performance of crosses between untested maize inbreds. Crop Sci 36:50–56
Google Scholar
Bernardo R (1996b) Best linear unbiased prediction of maize single-cross performance. Crop Sci 36:872–876
Google Scholar
Bernardo R (2008) Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci 48:1649–1664
Article Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Article Google Scholar
Charcosset A, Bonnisseau B, Touchebeuf O, Burstin J, Dubreuil P, Barriére Y, Gallais A, Denis JB (1998) Prediction of maize hybrid silage performance using marker data: comparison of several models for specific combining ability. Crop Sci 38:38–44
Article Google Scholar
Cullis B, Gogel B, Verbyla A, Thompson R (1998) Spatial analysis of multi-environment early generation trials. Biometrics 54:1–18
Article Google Scholar
Frisch M, Thiemann A, Fu J, Schrag TA, Scholten S, Melchinger AE (2009) Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize. Theor Appl Genet (in press)
Gilmour AR, Cullis BR, Verbyla AP (1997) Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat 2:269–293
Article Google Scholar
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Global Optim 13:455–492
Article Google Scholar
Laloë D, (1993) Precision and information in linear models of genetic evaluation. Genet Sel Evol 25:557–576
Article Google Scholar
Maenhout S, De Baets B, Haesaert G, Van Bockstaele E (2007) Support vector machine regression for the prediction of maize hybrid performance. Theor Appl Genet 115:1003–1013
Article CAS PubMed Google Scholar
Maenhout S, De Baets B, Haesaert G, Van Bockstaele E (2008) Marker-based screening of maize inbred lines using support vector machine regression. Euphytica 161:123–131
Article Google Scholar
Maenhout S, De Baets B, Haesaert G (2009) Marker-based estimation of the coefficient of coancestry in hybrid breeding programmes. Theor Appl Genet 118:1181–1192
Article CAS PubMed Google Scholar
Oakey H, Verbyla AP, Cullis BR, Wei X, Pitchford WS (2007) Joint modeling of additive and non-additive (genetic line) effects in multi-environment trials. Theor Appl Genet 114:1319–1332
Article PubMed Google Scholar
Schrag TA, Maurer HP, Melchinger AE, Piepho HP, Peleman J, Frisch M (2007) Prediction of single-cross hybrid performance in maize using haplotype blocks associated with QTL for grain yield. Theor Appl Genet 114:1345–1355
Article PubMed Google Scholar
Schrag TA, Möhring J, Maurer HP, Dhillon BS, Melchinger AE, Piepho HP, Sorensen AP, Frisch M (2009) Molecular marker-based prediction of hybrid performance in maize using unbalanced data from multiple experiments with factorial crosses. Theor Appl Genet 118:741–751
Article CAS PubMed Google Scholar
Schrag TA, Möhring J, Kusterer B, Dhillon BS, Melchinger AE, Piepho HP, Frisch M (2009) Hybrid performance prediction in maize using molecular markers and joint analyses of hybrids and parental inbreds. Theor Appl Genet (in press)
Smith A, Cullis B, Thompson R (2001) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147
Article CAS PubMed Google Scholar
Smola A, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Article Google Scholar
Stuber C, Cockerham C (1966) Gene effects and variances in hybrid populations. Genetics 54:1279–1286
PubMed CAS Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Google Scholar
Welham SJ, Cullis BR, Gogel BJ, Gilmour AR, Thompson R (2004) Prediction in linear mixed models. Aust NZ J Stat 46:325–347
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank the people from RAGT R2n for their unreserved and open-minded scientific contribution to this research. We also gratefully acknowledge the helpful comments and suggestions of two anonymous referees.

Author information

Authors and Affiliations

Department of Biosciences and Landscape Architecture, University College Ghent, Voskenslaan 270, 9000, Gent, Belgium
Steven Maenhout & Geert Haesaert
Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure links 653, 9000, Gent, Belgium
Bernard De Baets

Authors

Steven Maenhout
View author publications
You can also search for this author in PubMed Google Scholar
Bernard De Baets
View author publications
You can also search for this author in PubMed Google Scholar
Geert Haesaert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven Maenhout.

Additional information

Communicated by M. Cooper.

Contribution to the special issue “Heterosis in Plants”.

Appendix: Variance structure of the linear mixed models fitted to the phenotypic data of the validation field trial

The four random vectors $\user2{c}$, $\user2{a}_{\rm s},$ $\user2{a}_{\rm o}$ and $\user2{d}$ of Eq. 4 are assumed to be mutually independent. Furthermore, for each of these vectors $\user2{h} \in \{ \user2{c} , \user2{a}_{\rm s}, \user2{a}_{\rm o} , \user2{d} \}$ we assume that the variance has the separable form

$$ \hbox {Var}(\user2{h})=\user2{g}_e \otimes \user2{g}_{\rm v} , $$

(7)

where $\otimes$ denotes the Kronecker product. $\user2{g}_e$ represents a 3 × 3 symmetric matrix containing the covariance between environments while $\user2{g}_{\rm v}$ represents the covariance between the specified genetic components of the validation trial entries. We start by fitting a completely unstructured variance matrix for $\user2{g}_e$ while assuming an identity matrix for $\user2{g}_{\rm v}.$ In subsequent steps, the number of REML estimated variance components is reduced by fitting more parsimonious variance models for $\user2{g}_e$ using restricted maximum likelihood ratio tests in case of comparisons between nested models, or Akaike’s information criterion (AIC) otherwise. We attempt to fit a first-order factor analytic variance model such that $\user2{g}_e=\varvec{\lambda}\varvec{\lambda}^{\prime}+ \varvec{\Uppsi}$ where $\varvec{\lambda}$ is a vector of factor loadings and the matrix $\varvec{\Uppsi}$ is a diagonal matrix containing three location-specific variances (Smith et al. 2001). To obtain a more parsimonious model, the specific variances were sometimes made equal or zero (giving perfect correlation), and/or the loadings made equal (giving a common covariance (Cullis et al. 1998)). In a subsequent reduction, the variances on the diagonal are set equal which results in a compound symmetry model. The simplest model for $\user2{g}_e$ assumed zero covariance and equal variances.

Once the most parsimonious model for $\user2{g}_e$ is determined, we try different formulations for $\user2{g}_{\rm v}.$ We fit an identity matrix for the variance model of the six check varieties in vector $\user2{c}$ as no molecular marker or pedigree information is available for these varieties. For the vectors $\user2{a}_{\rm s}$ and $\user2{a}_{\rm o},$ containing the GCA effects of the inbred lines, we try to fit the different coefficient of coancestry derived matrices $\user2{a}$ described by Maenhout et al. (2009) or an identity matrix. In a similar way, we compare the different coefficient of fraternity-based matrices $ \user2{d}$ for the variance matrix $\user2{g}_{\rm v}$ pertaining to the vector $\user2{d}.$ Sometimes, the most parsimonious model is obtained by not using the separable form of Eq. 7 but directly fitting a common GCA or SCA effect for all three locations.

The variance of each vector of residuals $\user2{e}_i$ that make up vector $\user2{e}$ in Eq. 3 is modeled as a separable process in the direction of rows and columns so we can write Var$(\user2{e}_i)=\varvec{\Upsigma}_{ic} \otimes \varvec{\Upsigma}_{ir}$ where ⊗ denotes the Kronecker product. The matrices $\varvec{\Upsigma}_{ic}$ and $\varvec{\Upsigma}_{ir}$ are either identity matrices or contain first order autoregressive correlations to account for spatial variation as described in Gilmour et al. (1997), Smith et al. (2001) and Oakey et al. (2007). Table 4 gives an overview of the final model for the variance structure of vectors $\user2{g}$ and $\user2{e}$ for each trait.

Table 4 Summary of the variance structures fitted on the measurements of the validation data set for the traits grain yield, grain moisture content and days until flowering

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maenhout, S., De Baets, B. & Haesaert, G. Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction. Theor Appl Genet 120, 415–427 (2010). https://doi.org/10.1007/s00122-009-1200-5

Download citation

Received: 31 March 2009
Accepted: 22 October 2009
Published: 11 November 2009
Issue Date: January 2010
DOI: https://doi.org/10.1007/s00122-009-1200-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction

Abstract

Access this article

Similar content being viewed by others

Bayesian analysis and prediction of hybrid performance

Identification of optimal prediction models using multi-omic data for selecting hybrid rice

Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Variance structure of the linear mixed models fitted to the phenotypic data of the validation field trial

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction

Abstract

Access this article

Similar content being viewed by others

Bayesian analysis and prediction of hybrid performance

Identification of optimal prediction models using multi-omic data for selecting hybrid rice

Genomic Prediction of Complex Traits in an Allogamous Annual Crop: The Case of Maize Single-Cross Hybrids

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Variance structure of the linear mixed models fitted to the phenotypic data of the validation field trial

Appendix: Variance structure of the linear mixed models fitted to the phenotypic data of the validation field trial

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation