Skip to main content
Log in

Walking through the statistical black boxes of plant breeding

  • Review
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models.

Abstract

Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Acquaah G (2009) Principles of plant genetics and breeding. Wiley, Chichester

    Google Scholar 

  • Akdemir D, Jannink JL (2015) Locally epistatic genomic relationship matrices for genomic association and prediction. Genetics 199(3):857–871

    Article  PubMed  PubMed Central  Google Scholar 

  • Aulchenko YS, De Koning DJ, Haley C (2007) Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177(1):577–585

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Banerjee S, Finley AO, Waldmann P, Ericsson T (2010) Hierarchical spatial process models for multiple traits in large genetic trials. J Am Stat Assoc 105(490):506–521

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Basso B, Ritchie JT, Pierce FJ, Braga RP, Jones JW (2001) Spatial validation of crop models for precision agriculture. Agric Syst 68(2):97–112

    Article  Google Scholar 

  • Beavis WD (1998) QTL analyses: power, precision, and accuracy. In: Paterson AH (ed) Molecular dissection of complex traits, vol 1. CRC Press, New York, pp 145–162

    Google Scholar 

  • Bernardo R, Nyquist WE (1998) Additive and testcross genetic variances in crosses among recombinant inbreds. Theor Appl Genet 97(1–2):116–121

    Article  Google Scholar 

  • Carvalho AD, Fritsche Neto R, Geraldi IO (2008) Estimation and prediction of parameters and breeding values in soybean using REML/BLUP and least squares. Crop Breed Appl Biotechnol 8(3):219–224

    Article  Google Scholar 

  • Cleveland DA, Soleri D (eds) (2002) Farmers, scientists, and plant breeding: integrating knowledge and practice. CABI Publishing, Wallingford

    Google Scholar 

  • Colombani C, Legarra A, Fritz S, Guillaume F, Croiseau P, Ducrocq V, Robert-Granié C (2013) Application of Bayesian least absolute shrinkage and selection operator (LASSO) and BayesCπ methods for genomic selection in French Holstein and Montbéliarde breeds. J Dairy Sci 96(1):575–591

    Article  CAS  PubMed  Google Scholar 

  • Crow JF, Kimura M (1970) An introduction to population genetics theory. An introduction to population genetics theory. Harper and Row, New York

    Google Scholar 

  • Dardanelli JL, Balzarini M, Martínez MJ, Cuniberti M, Resnik S, Ramunda SF et al (2006) Soybean maturity groups, environments, and their interaction define mega-environments for seed composition in Argentina. Crop Sci 46(5):1939–1947

    Article  Google Scholar 

  • de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res 92(04):295–308

    Article  CAS  Google Scholar 

  • de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193(2):327–345

    Article  PubMed Central  Google Scholar 

  • Dellaportas P, Forster JJ, Ntzoufras I (2002) On Bayesian model and variable selection using MCMC. Stat Comput 12(1):27–36

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39:1–38

    Google Scholar 

  • Deshmukh RK, Sonah H, Patil G, Chen W, Prince S, Mutava R et al (2014) Integrating omic approaches for abiotic stress tolerance in soybean. Plant Genet Genom 5:244

    Google Scholar 

  • Egli DB (2008a) Soybean yield trends from 1972 to 2003 in mid-western USA. Field Crops Res 106(1):53–59

    Article  Google Scholar 

  • Egli DB (2008b) Comparison of corn and soybean yields in the United States: historical trends and future prospects. Agron J 100(Supplement_3):S-79

    Article  Google Scholar 

  • Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3):250–255

    Article  Google Scholar 

  • Fang M, Jiang D, Li D, Yang R, Fu W, Pu L et al (2012) Improved LASSO priors for shrinkage quantitative trait loci mapping. Theor Appl Genet 124(7):1315–1324

    Article  PubMed  Google Scholar 

  • Farrall M (2004) Quantitative genetic variation: a post-modern view. Hum Mol Genet 13(suppl 1):R1–R7

    Article  CAS  PubMed  Google Scholar 

  • Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433

    Article  Google Scholar 

  • Forneris NS, Legarra A, Vitezica ZG, Tsuruta S, Aguilar I, Misztal I, Cantet RJ (2015) Quality control of genotypes using heritability estimates of gene content at the marker. Genetics 199(3):675–681

    Article  PubMed  PubMed Central  Google Scholar 

  • García-Cortés LA, Sorensen D (1996) On a multivariate implementation of the Gibbs sampler. Genet Sel Evol 28(1):121–126

    Article  PubMed Central  Google Scholar 

  • Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Pattern Anal Mach Intell IEEE Trans 6:721–741

    Article  CAS  Google Scholar 

  • George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889

    Article  Google Scholar 

  • Gianola D (2013) Priors in whole-genome regression: the Bayesian alphabet returns. Genetics 194(3):573–596

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gianola D, Foulley JL, Fernando RL (1986) Prediction of breeding values when variances are not known. Genet Sel Evol 18(4):485–498

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173(3):1761–1776

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440–1450

    Article  Google Scholar 

  • Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2009) ASReml user guide release 3.0. VSN International Ltd, Hemel Hempstead

    Google Scholar 

  • Glémin S (2010) Surprising fitness consequences of GC-biased gene conversion: I. Mutation load and inbreeding depression. Genetics 185(3):939–959

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Guimarães-Dias F, Neves-Borges AC, Viana AAB, Mesquita RO, Romano E, Grossi-de-Sa MDF et al (2012) Expression analysis in response to drought stress in soybean: shedding light on the regulation of metabolic pathway genes. Genet Mol Biol 35(1):222–232

    Article  PubMed  PubMed Central  Google Scholar 

  • Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinform 12(1):186

    Article  Google Scholar 

  • Halperin E, Stephan DA (2009) SNP imputation in association studies. Nat Biotechnol 27(4):349–351

    Article  CAS  PubMed  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85

    Google Scholar 

  • Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2):423–447

    Article  CAS  PubMed  Google Scholar 

  • Henderson CR (1984) Applications of linear models in animal breeding. University of Guelph, Guelph, ISBN 9780889550308

    Google Scholar 

  • Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67

    Article  Google Scholar 

  • Hofer A (1998) Variance component estimation in animal breeding: a review. J Anim Breed Genet 115(1–6):247–265

    Article  Google Scholar 

  • Imhof LA, Nowak MA (2006) Evolutionary game dynamics in a Wright–Fisher process. J Math Biol 52(5):667–681

    Article  PubMed  PubMed Central  Google Scholar 

  • Jarquín D, Kocak K, Posadas L, Hyma K, Jedlicka J, Graef G, Lorenz A (2014) Genotyping by sequencing for genomic prediction in a soybean breeding population. BMC Genom 15(1):740

    Article  Google Scholar 

  • Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178(3):1709–1723

    Article  PubMed  PubMed Central  Google Scholar 

  • Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42(4):348–354

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49(4):725

    CAS  PubMed  PubMed Central  Google Scholar 

  • Kuo L, Mallick B (1998) Variable selection for regression models. Sankhya Indian J Stat Ser B 60(1):65–81

  • Lado B, Matus I, Rodríguez A, Inostroza L, Poland J, Belzile F et al (2013) Increased genomic prediction accuracy in wheat breeding through spatial adjustment of field trial data. G3: genes| genomes|. Genetics 3(12):2105–2114

    Google Scholar 

  • Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199

    CAS  PubMed  PubMed Central  Google Scholar 

  • Le DT, Nishiyama R, Watanabe Y, Mochida K, Yamaguchi-Shinozaki K, Shinozaki K, Tran LSP (2011) Genome-wide survey and expression analysis of the plant-specific NAC transcription factor family in soybean during development and dehydration stress. DNA Res 18(4):263–276

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lee SH, van der Werf JH (2016) MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics 10:btw012

    Google Scholar 

  • Legarra A, Misztal I (2008) Technical note: computing strategies in genome-wide selection. J Dairy Sci 91(1):360–366

    Article  CAS  PubMed  Google Scholar 

  • Legarra A, Robert-Granié C, Croiseau P, Guillaume F, Fritz S (2011) Improved Lasso for genomic selection. Genet Res 93(01):77–87

    Article  CAS  Google Scholar 

  • Legarra A, Croiseau P, Sanchez MP, Teyssèdre S, Sallé G, Allais S et al (2015) A comparison of methods for whole-genome QTL mapping using dense markers in four livestock species. Genet Sel Evol 47(1):6

    Article  PubMed  PubMed Central  Google Scholar 

  • Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, Schön CC (2013) Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol 12(3):375–391

    PubMed  Google Scholar 

  • Li Z, Sillanpää MJ (2012) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125(3):419–435

    Article  CAS  PubMed  Google Scholar 

  • Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332

    Article  CAS  PubMed  Google Scholar 

  • Lim C (1997) An econometric classification and review of international tourism demand models. Tour Econ 3(1):69–81

    Google Scholar 

  • Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8(10):833–835

    Article  CAS  PubMed  Google Scholar 

  • Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N (2015) Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47(3):284–290

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits, vol 1. Sinauer, Sunderland

    Google Scholar 

  • MacLeod IM, Hayes BJ, Goddard ME (2014) The effects of demography and long-term selection on the accuracy of genomic prediction with sequence data. Genetics 198(4):1671–1684

    Article  PubMed  PubMed Central  Google Scholar 

  • Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11(7):499–511

    Article  CAS  PubMed  Google Scholar 

  • Matilainen K, Mäntysaari EA, Lidauer MH, Strandén I, Thompson R (2013) Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters. PLoS One 8(12):e80821

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Meuwissen TMH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829

    CAS  PubMed  PubMed Central  Google Scholar 

  • Meyer K (1989) Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. Genet Sel Evol 21:317–340

    Article  PubMed Central  Google Scholar 

  • Meyer K (2007) WOMBAT: a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). J Zhejiang Univ Sci B 8(11):815–821

    Article  PubMed  PubMed Central  Google Scholar 

  • Misztal I, Tsuruta S, Strabel T, Auvray B, Druet T, Lee DH (2002) BLUPF90 and related programs (BGF90). In: Proceedings of the 7th World congress on genetics applied to livestock production, Montpellier, France, August, 2002. Session 28. Institut National de la Recherche Agronomique (INRA), pp 1–2

  • Morota G, Boddhireddy P, Vukasinovic N, Gianola D, DeNise S (2014) Kernel-based variance component estimation and whole-genome prediction of pre-corrected phenotypes and progeny tests for dairy cow health traits. Front Genet 5(56):10–3389

    Google Scholar 

  • Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313

    Article  Google Scholar 

  • Nyquist WE, Baker RJ (1991) Estimation of heritability and prediction of selection response in plant populations. Crit Rev Plant Sci 10(3):235–322

    Article  Google Scholar 

  • O’Hara RB, Sillanpää MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4(1):85–117

    Article  Google Scholar 

  • Orr HA (2005) The genetic theory of adaptation: a brief history. Nat Rev Genet 6(2):119–127

    Article  CAS  PubMed  Google Scholar 

  • Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103(482):681–686

    Article  CAS  Google Scholar 

  • Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58(3):545–554

    Article  Google Scholar 

  • Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49(4):1165–1176

    Article  Google Scholar 

  • Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161(1–2):209–228

    Article  Google Scholar 

  • Poland JA, Rife TW (2012) Genotyping-by-sequencing for plant breeding and genetics. Plant Genome 5(3):92–102

    Article  CAS  Google Scholar 

  • Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909

    Article  CAS  PubMed  Google Scholar 

  • Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959

    CAS  PubMed  PubMed Central  Google Scholar 

  • Rasmussen CE (2004) Gaussian processes in machine learning. In: Advanced lectures on machine learning. Springer, Berlin, Heidelberg, pp 63–71

  • Recker JR, Burton JW, Cardinal A, Miranda L (2014) Genetic and phenotypic correlations of quantitative traits in two long-term, randomly mated soybean populations. Crop Sci 54(3):939–943

    Article  Google Scholar 

  • Rincker K, Nelson R, Specht J, Sleper D, Cary T, Cianzio SR, et al (2014) Genetic improvement of US soybean in maturity groups II, III, and IV. Crop Sci 54(4):1419–1432

    Google Scholar 

  • Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6(1):15–32

    Article  Google Scholar 

  • Rutkoski JE, Poland J, Jannink JL, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. G3: genes| Genomes|. Genetics 3(3):427–439

    Google Scholar 

  • Searle SR (1979) Notes on variance component estimation: a detailed account of maximum likelihood and kindred methodology. Paper BU-673M, Biometrics Unit, Cornell University

  • Sonah H, O’Donoughue L, Cober E, Rajcan I, Belzile F (2014) Identification of loci governing eight agronomic traits using a GBS|GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol J 13(2):211–221

    Article  PubMed  CAS  Google Scholar 

  • Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Statistics for biology and health. Springer, New York

    Book  Google Scholar 

  • Specht JE, Hume DJ, Kumudini SV (1999) Soybean yield potential-a genetic and physiological perspective. Crop Sci 39(6):1560–1570

    Article  Google Scholar 

  • St. Martin SK (1982) Effective population size for the soybean improvement program in maturity groups 00 to IV. Crop Sci 22(1):151–152

    Article  Google Scholar 

  • Strandén I, Christensen OF (2011) Allele coding in genomic evaluation. Genet Sel Evol 43(1):1–11

    Article  Google Scholar 

  • Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS (2012) Rapid variance components-based method for whole-genome association analysis. Nat Genet 44(10):1166–1170

    Article  CAS  PubMed  Google Scholar 

  • Swarts K, Li H, Romero Navarro JA, An D, Romay MC, Hearne S et al (2014) Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome 7(3):1–12

    Article  CAS  Google Scholar 

  • Tabangin ME, Woo JG, Martin LJ (2009, December) The effect of minor allele frequency on the likelihood of obtaining false positives. In: BMC Proceedings, vol 3, no. Suppl 7. BioMed Central Ltd, p S41

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 1:267–288

    Google Scholar 

  • VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423

    Article  CAS  PubMed  Google Scholar 

  • Wang CS, Rutledge JJ, Gianola D (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet Sel Evol 25:41–62

    Article  PubMed Central  Google Scholar 

  • Wei J, Xu S (2016) A random model approach to QTL mapping in multi-parent advanced generation inter-cross (MAGIC) populations. Genetics 202(2):471–486

    Article  PubMed  Google Scholar 

  • Wen ZX, Zhao TJ, Zheng YZ, Liu SH, Wang CE, Wang F, Gai JY (2008) Association analysis of agronomic and quality traits with SSR markers in Glycine max and Glycine soja in China: I. Population structure and associated markers. Acta Agronomica Sinica 34(7):1169–1178

    Article  CAS  Google Scholar 

  • Wricke G, Weber E (1986) Quantitative genetics and selection in plant breeding. Walter de Gruyter, Berlin, New York, ISBN 3-11-007561-X

    Book  Google Scholar 

  • Wright S (1922) Coefficients of inbreeding and relationship. Am Nat 56(645):330–338

    Article  Google Scholar 

  • Wright S (1930) Evolution in Mendelian populations. Genetics 16(2):97

    Google Scholar 

  • Xavier A, Xu S, Muir WM, and Rainey KM (2015) NAM: association studies in multiple populations. Bioinformatics 31(23):3862–3864

    CAS  PubMed  Google Scholar 

  • Xavier A, Muir WM, Rainey KM (2016) Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans. BMC Bioinform 17(1):1

    Article  Google Scholar 

  • Xu S (2003) Theoretical basis of the Beavis effect. Genetics 165(4):2259–2268

    PubMed  PubMed Central  Google Scholar 

  • Xu S (2013) Mapping quantitative trait loci by controlling polygenic background effect. Genetics 195(4):1209–1222

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xu H, Shete S (2005) Effects of population structure on genetic association studies. BMC Genet 6(Suppl 1):S109

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Yan W, Rajcan I (2003) Prediction of cultivar performance based on single-versus multiple-year tests in soybean. Crop Sci 43(2):549–555

    Article  Google Scholar 

  • Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46(2):100–106

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179(2):1045–1055

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF et al (2005) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38(2):203–208

    Article  PubMed  CAS  Google Scholar 

  • Zas R (2006) Iterative kriging for removing spatial autocorrelation in analysis of forest genetic trials. Tree Genet Genomes 2(4):177–185

    Article  Google Scholar 

  • Zeng ZB, Hill WG (1986) The selection limit due to the conflict between truncation and stabilizing selection with mutation. Genetics 114(4):1313–1328

    CAS  PubMed  PubMed Central  Google Scholar 

  • Zeng ZB, Wang T, Zou W (2005) Modeling quantitative trait loci and interpretation of models. Genetics 169(3):1711–1725

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang LX, Kyei-Boahen S, Zhang J, Zhang MH, Freeland TB, Watson CE, Liu X (2007) Modifications of optimum adaptation zones for soybean maturity groups in the USA. Crop Manag 6(1):1–11

    Article  CAS  Google Scholar 

  • Zhang Z, Liu J, Ding X, Bijma P, de Koning DJ, Zhang Q (2010a) Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS One 5(9):e12648

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA et al (2010b) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42(4):355–360

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44(7):821–824

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11(4):407–409

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Katy Martin Rainey.

Ethics declarations

Conflict of interest

Authors declare no conflict of interest.

Additional information

Communicated by R. K. Varshney.

Appendix: Numerical example of design matrices

Appendix: Numerical example of design matrices

Suppose that a breeding program is conducting a test with a three-way hybrid (\({\text{A}} \times {\text{B}} \times {\text{C}}\)) to find out the narrow-sense heritability of the trait of interest. The only genetic information available is a short pedigree that describes the three-way cross, as follows:

figure a

This evaluation was conducted in a single environment, growing two replicates of each parent (\({\text{A}},{\text{B}},{\text{C}}\)) and the final hybrid (\({\text{E}}\)). Considering that a plot with genotype C was lost during the growing season, the design matrices are given by:

figure b

The example above was run using the Gibbs sampling algorithm shown in the manuscript, with the prior suggested here (\(\upnu^{*} = 5 \;{\text{and }}{\rm S}^{*} = 0.5 \times {\text{var}}\left( {\mathbf{y}} \right) = 5.17\)). The outcome was:

$$\textbf{b} = [23.812]\quad \textbf{u} = \left[ {\begin{array}{*{20}c} {1.191} \\ {0.172} \\ { - 1.291} \\ {0.799} \\ { - 0.060} \\ \end{array} } \right]\quad \sigma _{a}^{2} = 4.004\quad \sigma _{e}^{2} = 6.987$$

which yields a narrow-sense heritability of 0.364, and breeding values (u) computed for all genotypes, including the parental line D not grown in the field.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xavier, A., Muir, W.M., Craig, B. et al. Walking through the statistical black boxes of plant breeding. Theor Appl Genet 129, 1933–1949 (2016). https://doi.org/10.1007/s00122-016-2750-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-016-2750-y

Keywords

Navigation