Abstract
Accurate prediction of the phenotypic performance of a hybrid plant based on the molecular fingerprints of its parents should lead to a more cost-effective breeding programme as it allows to reduce the number of expensive field evaluations. The construction of a reliable prediction model requires a representative sample of hybrids for which both molecular and phenotypic information are accessible. This phenotypic information is usually readily available as typical breeding programmes test numerous new hybrids in multi-location field trials on a yearly basis. Earlier studies indicated that a linear mixed model analysis of this typically unbalanced phenotypic data allows to construct ɛ-insensitive support vector machine regression and best linear prediction models for predicting the performance of single-cross maize hybrids. We compare these prediction methods using different subsets of the phenotypic and marker data of a commercial maize breeding programme and evaluate the resulting prediction accuracies by means of a specifically designed field experiment. This balanced field trial allows to assess the reliability of the cross-validation prediction accuracies reported here and in earlier studies. The limits of the predictive capabilities of both prediction methods are further examined by reducing the number of training hybrids and the size of the molecular fingerprints. The results indicate a considerable discrepancy between prediction accuracies obtained by cross-validation procedures and those obtained by correlating the predictions with the results of a validation field trial. The prediction accuracy of best linear prediction was less sensitive to a reduction of the number of training examples compared with that of support vector machine regression. The latter was, however, better at predicting hybrid performance when the size of the molecular fingerprints was reduced, especially if the initial set of markers had a low information content.
Similar content being viewed by others
References
Bernardo R (1993) Estimation of coefficient of coancestry using molecular markers in maize. Theor Appl Genet 85:1055–1062
Bernardo R (1994) Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci 34:20–25
Bernardo R (1995) Genetic models for predicting maize single-cross performance in unbalanced yield trial data. Crop Sci 35:141–147
Bernardo R (1996a) Best linear unbiased prediction of the performance of crosses between untested maize inbreds. Crop Sci 36:50–56
Bernardo R (1996b) Best linear unbiased prediction of maize single-cross performance. Crop Sci 36:872–876
Bernardo R (2008) Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci 48:1649–1664
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Disc 2:121–167
Charcosset A, Bonnisseau B, Touchebeuf O, Burstin J, Dubreuil P, Barriére Y, Gallais A, Denis JB (1998) Prediction of maize hybrid silage performance using marker data: comparison of several models for specific combining ability. Crop Sci 38:38–44
Cullis B, Gogel B, Verbyla A, Thompson R (1998) Spatial analysis of multi-environment early generation trials. Biometrics 54:1–18
Frisch M, Thiemann A, Fu J, Schrag TA, Scholten S, Melchinger AE (2009) Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize. Theor Appl Genet (in press)
Gilmour AR, Cullis BR, Verbyla AP (1997) Accounting for natural and extraneous variation in the analysis of field experiments. J Agric Biol Environ Stat 2:269–293
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Global Optim 13:455–492
Laloë D, (1993) Precision and information in linear models of genetic evaluation. Genet Sel Evol 25:557–576
Maenhout S, De Baets B, Haesaert G, Van Bockstaele E (2007) Support vector machine regression for the prediction of maize hybrid performance. Theor Appl Genet 115:1003–1013
Maenhout S, De Baets B, Haesaert G, Van Bockstaele E (2008) Marker-based screening of maize inbred lines using support vector machine regression. Euphytica 161:123–131
Maenhout S, De Baets B, Haesaert G (2009) Marker-based estimation of the coefficient of coancestry in hybrid breeding programmes. Theor Appl Genet 118:1181–1192
Oakey H, Verbyla AP, Cullis BR, Wei X, Pitchford WS (2007) Joint modeling of additive and non-additive (genetic line) effects in multi-environment trials. Theor Appl Genet 114:1319–1332
Schrag TA, Maurer HP, Melchinger AE, Piepho HP, Peleman J, Frisch M (2007) Prediction of single-cross hybrid performance in maize using haplotype blocks associated with QTL for grain yield. Theor Appl Genet 114:1345–1355
Schrag TA, Möhring J, Maurer HP, Dhillon BS, Melchinger AE, Piepho HP, Sorensen AP, Frisch M (2009) Molecular marker-based prediction of hybrid performance in maize using unbalanced data from multiple experiments with factorial crosses. Theor Appl Genet 118:741–751
Schrag TA, Möhring J, Kusterer B, Dhillon BS, Melchinger AE, Piepho HP, Frisch M (2009) Hybrid performance prediction in maize using molecular markers and joint analyses of hybrids and parental inbreds. Theor Appl Genet (in press)
Smith A, Cullis B, Thompson R (2001) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147
Smola A, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Stuber C, Cockerham C (1966) Gene effects and variances in hybrid populations. Genetics 54:1279–1286
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Welham SJ, Cullis BR, Gogel BJ, Gilmour AR, Thompson R (2004) Prediction in linear mixed models. Aust NZ J Stat 46:325–347
Acknowledgments
The authors would like to thank the people from RAGT R2n for their unreserved and open-minded scientific contribution to this research. We also gratefully acknowledge the helpful comments and suggestions of two anonymous referees.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Cooper.
Contribution to the special issue “Heterosis in Plants”.
Appendix: Variance structure of the linear mixed models fitted to the phenotypic data of the validation field trial
Appendix: Variance structure of the linear mixed models fitted to the phenotypic data of the validation field trial
The four random vectors \(\user2{c}\), \(\user2{a}_{\rm s},\) \(\user2{a}_{\rm o}\) and \(\user2{d}\) of Eq. 4 are assumed to be mutually independent. Furthermore, for each of these vectors \(\user2{h} \in \{ \user2{c} , \user2{a}_{\rm s}, \user2{a}_{\rm o} , \user2{d} \}\) we assume that the variance has the separable form
where \(\otimes\) denotes the Kronecker product. \(\user2{g}_e\) represents a 3 × 3 symmetric matrix containing the covariance between environments while \(\user2{g}_{\rm v}\) represents the covariance between the specified genetic components of the validation trial entries. We start by fitting a completely unstructured variance matrix for \(\user2{g}_e\) while assuming an identity matrix for \(\user2{g}_{\rm v}.\) In subsequent steps, the number of REML estimated variance components is reduced by fitting more parsimonious variance models for \(\user2{g}_e\) using restricted maximum likelihood ratio tests in case of comparisons between nested models, or Akaike’s information criterion (AIC) otherwise. We attempt to fit a first-order factor analytic variance model such that \(\user2{g}_e=\varvec{\lambda}\varvec{\lambda}^{\prime}+ \varvec{\Uppsi}\) where \(\varvec{\lambda}\) is a vector of factor loadings and the matrix \(\varvec{\Uppsi}\) is a diagonal matrix containing three location-specific variances (Smith et al. 2001). To obtain a more parsimonious model, the specific variances were sometimes made equal or zero (giving perfect correlation), and/or the loadings made equal (giving a common covariance (Cullis et al. 1998)). In a subsequent reduction, the variances on the diagonal are set equal which results in a compound symmetry model. The simplest model for \(\user2{g}_e\) assumed zero covariance and equal variances.
Once the most parsimonious model for \(\user2{g}_e\) is determined, we try different formulations for \(\user2{g}_{\rm v}.\) We fit an identity matrix for the variance model of the six check varieties in vector \(\user2{c}\) as no molecular marker or pedigree information is available for these varieties. For the vectors \(\user2{a}_{\rm s}\) and \(\user2{a}_{\rm o},\) containing the GCA effects of the inbred lines, we try to fit the different coefficient of coancestry derived matrices \(\user2{a}\) described by Maenhout et al. (2009) or an identity matrix. In a similar way, we compare the different coefficient of fraternity-based matrices \( \user2{d}\) for the variance matrix \(\user2{g}_{\rm v}\) pertaining to the vector \(\user2{d}.\) Sometimes, the most parsimonious model is obtained by not using the separable form of Eq. 7 but directly fitting a common GCA or SCA effect for all three locations.
The variance of each vector of residuals \(\user2{e}_i\) that make up vector \(\user2{e}\) in Eq. 3 is modeled as a separable process in the direction of rows and columns so we can write Var\((\user2{e}_i)=\varvec{\Upsigma}_{ic} \otimes \varvec{\Upsigma}_{ir}\) where ⊗ denotes the Kronecker product. The matrices \(\varvec{\Upsigma}_{ic}\) and \(\varvec{\Upsigma}_{ir}\) are either identity matrices or contain first order autoregressive correlations to account for spatial variation as described in Gilmour et al. (1997), Smith et al. (2001) and Oakey et al. (2007). Table 4 gives an overview of the final model for the variance structure of vectors \(\user2{g}\) and \(\user2{e}\) for each trait.
Rights and permissions
About this article
Cite this article
Maenhout, S., De Baets, B. & Haesaert, G. Prediction of maize single-cross hybrid performance: support vector machine regression versus best linear prediction. Theor Appl Genet 120, 415–427 (2010). https://doi.org/10.1007/s00122-009-1200-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-009-1200-5