Walking through the statistical black boxes of plant breeding

Xavier, Alencar; Muir, William M.; Craig, Bruce; Rainey, Katy Martin

doi:10.1007/s00122-016-2750-y

Walking through the statistical black boxes of plant breeding

Review
Published: 19 July 2016

Volume 129, pages 1933–1949, (2016)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Alencar Xavier¹,
William M. Muir²,
Bruce Craig³ &
…
Katy Martin Rainey¹

3304 Accesses
22 Citations
17 Altmetric
Explore all metrics

Abstract

Key message

The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models.

Abstract

Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical and Quantitative Genetics Studies

Reinventing quantitative genetics for plant breeding: something old, something new, something borrowed, something BLUE

Article Open access 15 April 2020

Genome-Enabled Prediction Using the BLR (Bayesian Linear Regression) R-Package

References

Acquaah G (2009) Principles of plant genetics and breeding. Wiley, Chichester
Google Scholar
Akdemir D, Jannink JL (2015) Locally epistatic genomic relationship matrices for genomic association and prediction. Genetics 199(3):857–871
Article PubMed PubMed Central Google Scholar
Aulchenko YS, De Koning DJ, Haley C (2007) Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177(1):577–585
Article CAS PubMed PubMed Central Google Scholar
Banerjee S, Finley AO, Waldmann P, Ericsson T (2010) Hierarchical spatial process models for multiple traits in large genetic trials. J Am Stat Assoc 105(490):506–521
Article CAS PubMed PubMed Central Google Scholar
Basso B, Ritchie JT, Pierce FJ, Braga RP, Jones JW (2001) Spatial validation of crop models for precision agriculture. Agric Syst 68(2):97–112
Article Google Scholar
Beavis WD (1998) QTL analyses: power, precision, and accuracy. In: Paterson AH (ed) Molecular dissection of complex traits, vol 1. CRC Press, New York, pp 145–162
Google Scholar
Bernardo R, Nyquist WE (1998) Additive and testcross genetic variances in crosses among recombinant inbreds. Theor Appl Genet 97(1–2):116–121
Article Google Scholar
Carvalho AD, Fritsche Neto R, Geraldi IO (2008) Estimation and prediction of parameters and breeding values in soybean using REML/BLUP and least squares. Crop Breed Appl Biotechnol 8(3):219–224
Article Google Scholar
Cleveland DA, Soleri D (eds) (2002) Farmers, scientists, and plant breeding: integrating knowledge and practice. CABI Publishing, Wallingford
Google Scholar
Colombani C, Legarra A, Fritz S, Guillaume F, Croiseau P, Ducrocq V, Robert-Granié C (2013) Application of Bayesian least absolute shrinkage and selection operator (LASSO) and BayesCπ methods for genomic selection in French Holstein and Montbéliarde breeds. J Dairy Sci 96(1):575–591
Article CAS PubMed Google Scholar
Crow JF, Kimura M (1970) An introduction to population genetics theory. An introduction to population genetics theory. Harper and Row, New York
Google Scholar
Dardanelli JL, Balzarini M, Martínez MJ, Cuniberti M, Resnik S, Ramunda SF et al (2006) Soybean maturity groups, environments, and their interaction define mega-environments for seed composition in Argentina. Crop Sci 46(5):1939–1947
Article Google Scholar
de los Campos G, Gianola D, Rosa GJ, Weigel KA, Crossa J (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res 92(04):295–308
Article CAS Google Scholar
de los Campos G, Hickey JM, Pong-Wong R, Daetwyler HD, Calus MP (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193(2):327–345
Article PubMed Central Google Scholar
Dellaportas P, Forster JJ, Ntzoufras I (2002) On Bayesian model and variable selection using MCMC. Stat Comput 12(1):27–36
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39:1–38
Google Scholar
Deshmukh RK, Sonah H, Patil G, Chen W, Prince S, Mutava R et al (2014) Integrating omic approaches for abiotic stress tolerance in soybean. Plant Genet Genom 5:244
Google Scholar
Egli DB (2008a) Soybean yield trends from 1972 to 2003 in mid-western USA. Field Crops Res 106(1):53–59
Article Google Scholar
Egli DB (2008b) Comparison of corn and soybean yields in the United States: historical trends and future prospects. Agron J 100(Supplement_3):S-79
Article Google Scholar
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4(3):250–255
Article Google Scholar
Fang M, Jiang D, Li D, Yang R, Fu W, Pu L et al (2012) Improved LASSO priors for shrinkage quantitative trait loci mapping. Theor Appl Genet 124(7):1315–1324
Article PubMed Google Scholar
Farrall M (2004) Quantitative genetic variation: a post-modern view. Hum Mol Genet 13(suppl 1):R1–R7
Article CAS PubMed Google Scholar
Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinb 52:399–433
Article Google Scholar
Forneris NS, Legarra A, Vitezica ZG, Tsuruta S, Aguilar I, Misztal I, Cantet RJ (2015) Quality control of genotypes using heritability estimates of gene content at the marker. Genetics 199(3):675–681
Article PubMed PubMed Central Google Scholar
García-Cortés LA, Sorensen D (1996) On a multivariate implementation of the Gibbs sampler. Genet Sel Evol 28(1):121–126
Article PubMed Central Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Pattern Anal Mach Intell IEEE Trans 6:721–741
Article CAS Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
Article Google Scholar
Gianola D (2013) Priors in whole-genome regression: the Bayesian alphabet returns. Genetics 194(3):573–596
Article CAS PubMed PubMed Central Google Scholar
Gianola D, Foulley JL, Fernando RL (1986) Prediction of breeding values when variances are not known. Genet Sel Evol 18(4):485–498
Article CAS PubMed PubMed Central Google Scholar
Gianola D, Fernando RL, Stella A (2006) Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173(3):1761–1776
Article CAS PubMed PubMed Central Google Scholar
Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440–1450
Article Google Scholar
Gilmour AR, Gogel BJ, Cullis BR, Thompson R (2009) ASReml user guide release 3.0. VSN International Ltd, Hemel Hempstead
Google Scholar
Glémin S (2010) Surprising fitness consequences of GC-biased gene conversion: I. Mutation load and inbreeding depression. Genetics 185(3):939–959
Article PubMed PubMed Central CAS Google Scholar
Guimarães-Dias F, Neves-Borges AC, Viana AAB, Mesquita RO, Romano E, Grossi-de-Sa MDF et al (2012) Expression analysis in response to drought stress in soybean: shedding light on the regulation of metabolic pathway genes. Genet Mol Biol 35(1):222–232
Article PubMed PubMed Central Google Scholar
Habier D, Fernando RL, Kizilkaya K, Garrick DJ (2011) Extension of the Bayesian alphabet for genomic selection. BMC Bioinform 12(1):186
Article Google Scholar
Halperin E, Stephan DA (2009) SNP imputation in association studies. Nat Biotechnol 27(4):349–351
Article CAS PubMed Google Scholar
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Google Scholar
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31(2):423–447
Article CAS PubMed Google Scholar
Henderson CR (1984) Applications of linear models in animal breeding. University of Guelph, Guelph, ISBN 9780889550308
Google Scholar
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Article Google Scholar
Hofer A (1998) Variance component estimation in animal breeding: a review. J Anim Breed Genet 115(1–6):247–265
Article Google Scholar
Imhof LA, Nowak MA (2006) Evolutionary game dynamics in a Wright–Fisher process. J Math Biol 52(5):667–681
Article PubMed PubMed Central Google Scholar
Jarquín D, Kocak K, Posadas L, Hyma K, Jedlicka J, Graef G, Lorenz A (2014) Genotyping by sequencing for genomic prediction in a soybean breeding population. BMC Genom 15(1):740
Article Google Scholar
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178(3):1709–1723
Article PubMed PubMed Central Google Scholar
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42(4):348–354
Article CAS PubMed PubMed Central Google Scholar
Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49(4):725
CAS PubMed PubMed Central Google Scholar
Kuo L, Mallick B (1998) Variable selection for regression models. Sankhya Indian J Stat Ser B 60(1):65–81
Lado B, Matus I, Rodríguez A, Inostroza L, Poland J, Belzile F et al (2013) Increased genomic prediction accuracy in wheat breeding through spatial adjustment of field trial data. G3: genes| genomes|. Genetics 3(12):2105–2114
Google Scholar
Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199
CAS PubMed PubMed Central Google Scholar
Le DT, Nishiyama R, Watanabe Y, Mochida K, Yamaguchi-Shinozaki K, Shinozaki K, Tran LSP (2011) Genome-wide survey and expression analysis of the plant-specific NAC transcription factor family in soybean during development and dehydration stress. DNA Res 18(4):263–276
Article CAS PubMed PubMed Central Google Scholar
Lee SH, van der Werf JH (2016) MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics 10:btw012
Google Scholar
Legarra A, Misztal I (2008) Technical note: computing strategies in genome-wide selection. J Dairy Sci 91(1):360–366
Article CAS PubMed Google Scholar
Legarra A, Robert-Granié C, Croiseau P, Guillaume F, Fritz S (2011) Improved Lasso for genomic selection. Genet Res 93(01):77–87
Article CAS Google Scholar
Legarra A, Croiseau P, Sanchez MP, Teyssèdre S, Sallé G, Allais S et al (2015) A comparison of methods for whole-genome QTL mapping using dense markers in four livestock species. Genet Sel Evol 47(1):6
Article PubMed PubMed Central Google Scholar
Lehermeier C, Wimmer V, Albrecht T, Auinger HJ, Gianola D, Schmid VJ, Schön CC (2013) Sensitivity to prior specification in Bayesian genome-based prediction models. Stat Appl Genet Mol Biol 12(3):375–391
PubMed Google Scholar
Li Z, Sillanpää MJ (2012) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125(3):419–435
Article CAS PubMed Google Scholar
Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332
Article CAS PubMed Google Scholar
Lim C (1997) An econometric classification and review of international tourism demand models. Tour Econ 3(1):69–81
Google Scholar
Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8(10):833–835
Article CAS PubMed Google Scholar
Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK, Salem RM, Chasman DI, Ridker PM, Neale BM, Berger B, Patterson N (2015) Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47(3):284–290
Article CAS PubMed PubMed Central Google Scholar
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits, vol 1. Sinauer, Sunderland
Google Scholar
MacLeod IM, Hayes BJ, Goddard ME (2014) The effects of demography and long-term selection on the accuracy of genomic prediction with sequence data. Genetics 198(4):1671–1684
Article PubMed PubMed Central Google Scholar
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11(7):499–511
Article CAS PubMed Google Scholar
Matilainen K, Mäntysaari EA, Lidauer MH, Strandén I, Thompson R (2013) Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters. PLoS One 8(12):e80821
Article PubMed PubMed Central CAS Google Scholar
Meuwissen TMH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
CAS PubMed PubMed Central Google Scholar
Meyer K (1989) Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. Genet Sel Evol 21:317–340
Article PubMed Central Google Scholar
Meyer K (2007) WOMBAT: a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). J Zhejiang Univ Sci B 8(11):815–821
Article PubMed PubMed Central Google Scholar
Misztal I, Tsuruta S, Strabel T, Auvray B, Druet T, Lee DH (2002) BLUPF90 and related programs (BGF90). In: Proceedings of the 7th World congress on genetics applied to livestock production, Montpellier, France, August, 2002. Session 28. Institut National de la Recherche Agronomique (INRA), pp 1–2
Morota G, Boddhireddy P, Vukasinovic N, Gianola D, DeNise S (2014) Kernel-based variance component estimation and whole-genome prediction of pre-corrected phenotypes and progeny tests for dairy cow health traits. Front Genet 5(56):10–3389
Google Scholar
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Article Google Scholar
Nyquist WE, Baker RJ (1991) Estimation of heritability and prediction of selection response in plant populations. Crit Rev Plant Sci 10(3):235–322
Article Google Scholar
O’Hara RB, Sillanpää MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4(1):85–117
Article Google Scholar
Orr HA (2005) The genetic theory of adaptation: a brief history. Nat Rev Genet 6(2):119–127
Article CAS PubMed Google Scholar
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103(482):681–686
Article CAS Google Scholar
Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58(3):545–554
Article Google Scholar
Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49(4):1165–1176
Article Google Scholar
Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161(1–2):209–228
Article Google Scholar
Poland JA, Rife TW (2012) Genotyping-by-sequencing for plant breeding and genetics. Plant Genome 5(3):92–102
Article CAS Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909
Article CAS PubMed Google Scholar
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959
CAS PubMed PubMed Central Google Scholar
Rasmussen CE (2004) Gaussian processes in machine learning. In: Advanced lectures on machine learning. Springer, Berlin, Heidelberg, pp 63–71
Recker JR, Burton JW, Cardinal A, Miranda L (2014) Genetic and phenotypic correlations of quantitative traits in two long-term, randomly mated soybean populations. Crop Sci 54(3):939–943
Article Google Scholar
Rincker K, Nelson R, Specht J, Sleper D, Cary T, Cianzio SR, et al (2014) Genetic improvement of US soybean in maturity groups II, III, and IV. Crop Sci 54(4):1419–1432
Google Scholar
Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6(1):15–32
Article Google Scholar
Rutkoski JE, Poland J, Jannink JL, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. G3: genes| Genomes|. Genetics 3(3):427–439
Google Scholar
Searle SR (1979) Notes on variance component estimation: a detailed account of maximum likelihood and kindred methodology. Paper BU-673M, Biometrics Unit, Cornell University
Sonah H, O’Donoughue L, Cober E, Rajcan I, Belzile F (2014) Identification of loci governing eight agronomic traits using a GBS|GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol J 13(2):211–221
Article PubMed CAS Google Scholar
Sorensen D, Gianola D (2002) Likelihood, Bayesian, and MCMC methods in quantitative genetics. Statistics for biology and health. Springer, New York
Book Google Scholar
Specht JE, Hume DJ, Kumudini SV (1999) Soybean yield potential-a genetic and physiological perspective. Crop Sci 39(6):1560–1570
Article Google Scholar
St. Martin SK (1982) Effective population size for the soybean improvement program in maturity groups 00 to IV. Crop Sci 22(1):151–152
Article Google Scholar
Strandén I, Christensen OF (2011) Allele coding in genomic evaluation. Genet Sel Evol 43(1):1–11
Article Google Scholar
Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS (2012) Rapid variance components-based method for whole-genome association analysis. Nat Genet 44(10):1166–1170
Article CAS PubMed Google Scholar
Swarts K, Li H, Romero Navarro JA, An D, Romay MC, Hearne S et al (2014) Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome 7(3):1–12
Article CAS Google Scholar
Tabangin ME, Woo JG, Martin LJ (2009, December) The effect of minor allele frequency on the likelihood of obtaining false positives. In: BMC Proceedings, vol 3, no. Suppl 7. BioMed Central Ltd, p S41
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 1:267–288
Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423
Article CAS PubMed Google Scholar
Wang CS, Rutledge JJ, Gianola D (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genet Sel Evol 25:41–62
Article PubMed Central Google Scholar
Wei J, Xu S (2016) A random model approach to QTL mapping in multi-parent advanced generation inter-cross (MAGIC) populations. Genetics 202(2):471–486
Article PubMed Google Scholar
Wen ZX, Zhao TJ, Zheng YZ, Liu SH, Wang CE, Wang F, Gai JY (2008) Association analysis of agronomic and quality traits with SSR markers in Glycine max and Glycine soja in China: I. Population structure and associated markers. Acta Agronomica Sinica 34(7):1169–1178
Article CAS Google Scholar
Wricke G, Weber E (1986) Quantitative genetics and selection in plant breeding. Walter de Gruyter, Berlin, New York, ISBN 3-11-007561-X
Book Google Scholar
Wright S (1922) Coefficients of inbreeding and relationship. Am Nat 56(645):330–338
Article Google Scholar
Wright S (1930) Evolution in Mendelian populations. Genetics 16(2):97
Google Scholar
Xavier A, Xu S, Muir WM, and Rainey KM (2015) NAM: association studies in multiple populations. Bioinformatics 31(23):3862–3864
CAS PubMed Google Scholar
Xavier A, Muir WM, Rainey KM (2016) Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans. BMC Bioinform 17(1):1
Article Google Scholar
Xu S (2003) Theoretical basis of the Beavis effect. Genetics 165(4):2259–2268
PubMed PubMed Central Google Scholar
Xu S (2013) Mapping quantitative trait loci by controlling polygenic background effect. Genetics 195(4):1209–1222
Article CAS PubMed PubMed Central Google Scholar
Xu H, Shete S (2005) Effects of population structure on genetic association studies. BMC Genet 6(Suppl 1):S109
Article PubMed PubMed Central CAS Google Scholar
Yan W, Rajcan I (2003) Prediction of cultivar performance based on single-versus multiple-year tests in soybean. Crop Sci 43(2):549–555
Article Google Scholar
Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46(2):100–106
Article PubMed PubMed Central CAS Google Scholar
Yi N, Xu S (2008) Bayesian LASSO for quantitative trait loci mapping. Genetics 179(2):1045–1055
Article CAS PubMed PubMed Central Google Scholar
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF et al (2005) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38(2):203–208
Article PubMed CAS Google Scholar
Zas R (2006) Iterative kriging for removing spatial autocorrelation in analysis of forest genetic trials. Tree Genet Genomes 2(4):177–185
Article Google Scholar
Zeng ZB, Hill WG (1986) The selection limit due to the conflict between truncation and stabilizing selection with mutation. Genetics 114(4):1313–1328
CAS PubMed PubMed Central Google Scholar
Zeng ZB, Wang T, Zou W (2005) Modeling quantitative trait loci and interpretation of models. Genetics 169(3):1711–1725
Article CAS PubMed PubMed Central Google Scholar
Zhang LX, Kyei-Boahen S, Zhang J, Zhang MH, Freeland TB, Watson CE, Liu X (2007) Modifications of optimum adaptation zones for soybean maturity groups in the USA. Crop Manag 6(1):1–11
Article CAS Google Scholar
Zhang Z, Liu J, Ding X, Bijma P, de Koning DJ, Zhang Q (2010a) Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS One 5(9):e12648
Article PubMed PubMed Central CAS Google Scholar
Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA et al (2010b) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42(4):355–360
Article CAS PubMed PubMed Central Google Scholar
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44(7):821–824
Article CAS PubMed PubMed Central Google Scholar
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11(4):407–409
Article CAS PubMed PubMed Central Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67(2):301–320
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Agronomy, Purdue University, 915 W. State St., West Lafayette, IN, 47907, USA
Alencar Xavier & Katy Martin Rainey
Department of Animal Science, Purdue University, 150 N. University St., West Lafayette, IN, 47907, USA
William M. Muir
Department of Statistics, Purdue University, 915 W. State St., West Lafayette, IN, 47907, USA
Bruce Craig

Authors

Alencar Xavier
View author publications
You can also search for this author in PubMed Google Scholar
William M. Muir
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Craig
View author publications
You can also search for this author in PubMed Google Scholar
Katy Martin Rainey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katy Martin Rainey.

Ethics declarations

Conflict of interest

Authors declare no conflict of interest.

Additional information

Communicated by R. K. Varshney.

Appendix: Numerical example of design matrices

Suppose that a breeding program is conducting a test with a three-way hybrid (${\text{A}} \times {\text{B}} \times {\text{C}}$) to find out the narrow-sense heritability of the trait of interest. The only genetic information available is a short pedigree that describes the three-way cross, as follows:

This evaluation was conducted in a single environment, growing two replicates of each parent (${\text{A}},{\text{B}},{\text{C}}$) and the final hybrid (${\text{E}}$). Considering that a plot with genotype C was lost during the growing season, the design matrices are given by:

The example above was run using the Gibbs sampling algorithm shown in the manuscript, with the prior suggested here ($\upnu^{*} = 5 \;{\text{and }}{\rm S}^{*} = 0.5 \times {\text{var}}\left( {\mathbf{y}} \right) = 5.17$). The outcome was:

$$\textbf{b} = [23.812]\quad \textbf{u} = \left[ {\begin{array}{*{20}c} {1.191} \\ {0.172} \\ { - 1.291} \\ {0.799} \\ { - 0.060} \\ \end{array} } \right]\quad \sigma _{a}^{2} = 4.004\quad \sigma _{e}^{2} = 6.987$$

which yields a narrow-sense heritability of 0.364, and breeding values (u) computed for all genotypes, including the parental line D not grown in the field.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xavier, A., Muir, W.M., Craig, B. et al. Walking through the statistical black boxes of plant breeding. Theor Appl Genet 129, 1933–1949 (2016). https://doi.org/10.1007/s00122-016-2750-y

Download citation

Received: 29 January 2016
Accepted: 01 July 2016
Published: 19 July 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s00122-016-2750-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Walking through the statistical black boxes of plant breeding