Skip to main content
Log in

A fast genomic selection approach for large genomic data

  • Original Article
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

We propose a novel computational method for genomic selection that combines identical-by-state (IBS)-based Haseman–Elston (HE) regression and best linear prediction (BLP), called HE-BLP.

Abstract

Genomic best linear unbiased prediction (GBLUP) has been widely used in whole-genome prediction for breeding programs. To determine the total genetic variance of a training population, a linear mixed model (LMM) should be solved via restricted maximum likelihood (REML), whose computational complexity is the cube of the sample size. We proposed a novel computational method combining identical-by-state (IBS)-based Haseman–Elston (HE) regression and best linear prediction (BLP), called HE-BLP. With this method, the total genetic variance can be estimated by solving a simple HE linear regression, which has a computational complex of the sample size squared; therefore, it is suitable for large-scale genomic data, except those with which environmental effects need to be estimated simultaneously, because it does not allow for this estimation. In Monte Carlo simulation studies, the estimated heritability based on HE was identical to that based on REML, and the prediction accuracy via HE-BLP and traditional GBLUP was also quite similar when quantitative trait loci (QTLs) were randomly distributed along the genome and their effects followed a normal distribution. In addition, the kernel row number (KRN) trait in a maize IBM population was used to evaluate the performance of the two methods; the results showed similar prediction accuracy of breeding values despite slightly different estimated heritability via HE and REML, probably due to the underlying genetic architecture. HE-BLP can be a future genomic selection method choice for even larger sets of genomic data in certain special cases where environmental effects can be ignored. The software for HE regression and the simulation program is available online in the Genetic Analysis Repository (GEAR; https://github.com/gc5k/GEAR/wiki).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bernardo R, Yu JM (2007) Prospects for genome wide selection for quantitative traits in maize. Crop Sci 47:1082–1090

    Article  Google Scholar 

  • Bommert P, Nagasawa NS, Jackson D (2013) Quantitative variation in maize kernel row number is controlled by the FASCIATED EAR2 locus. Nat Genet 45:334–337

    Article  CAS  PubMed  Google Scholar 

  • Chen GB (2014) Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman-Elston regression. Front Genet 5:107

    PubMed  PubMed Central  Google Scholar 

  • Chen GB (2016) On the reconciliation of missing heritability for genome-wide association studies. Eur J Hum Genet 24:1810–1816

    Article  CAS  PubMed  Google Scholar 

  • Golan D, Lander ES, Rosset S (2014) Measuring missing heritability: Inferring the contribution of common variants. Proc Natl Acad Sci U S A 111:E5272–E5281

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hao XM, Li XW, Yang XH, Li JS (2014) Transferring a major QTL for oil content using marker-assisted backcrossing into an elite hybrid to increase the oil content in maize. Mol Breeding 34:739–748

    Article  CAS  Google Scholar 

  • Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci 92:433–443

    Article  CAS  PubMed  Google Scholar 

  • Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12

    Article  CAS  Google Scholar 

  • Henderson CR (1976) A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32:69–83

    Article  Google Scholar 

  • Hu Z, Yang RC (2014) Marker-based estimation of genetic parameters in genomics. PLoS ONE 9:e102715

    Article  PubMed  PubMed Central  Google Scholar 

  • Jannink JL, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genom 9:166–177

    Article  CAS  Google Scholar 

  • Jena KK, Jeung JU, Lee JH, Choi HC, Brar DS (2006) High-resolution mapping of a new brown planthopper (BPH) resistance gene, Bph18(t), and marker-assisted selection for BPH resistance in rice (Oryza sativa L.). Theor Appl Genet 112:288–297

    Article  CAS  PubMed  Google Scholar 

  • Lande R, Thompson (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743–756

    CAS  PubMed  PubMed Central  Google Scholar 

  • Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland

    Google Scholar 

  • Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    CAS  PubMed  PubMed Central  Google Scholar 

  • Misztal I (2016) Inexpensive computation of the inverse of the genomic relationship matrix in populations with small effective population size. Genetics 202:401–409

    Article  CAS  PubMed  Google Scholar 

  • Resende MF Jr, Munoz P, Resende MD, Garrick DJ, Fernando RL, Davis JM, Jokela EJ, Martin TA, Peter GF, Kirst M (2012) Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.). Genetics 190:1503–1510

    Article  PubMed  PubMed Central  Google Scholar 

  • Riedelsheimer C, Czedik-Eysenberg A, Grieder C, Lisec J, Technow F, Sulpice R, Altmann T, Stitt M, Willmitzer L, Melchinger AE (2012) Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet 44:217–220

    Article  CAS  PubMed  Google Scholar 

  • Ritland K (1996) A marker-based method for inference about quantitative inheritance in natural population. Evol Int J org Evol 50:1062–1073

    Article  Google Scholar 

  • Ritland K (2000) Marker-inferred relatedness as a tool for detecting heritability in nature. Mol Ecol 9:1195–1204

    Article  CAS  PubMed  Google Scholar 

  • Sillanpää MJ (2011) On statistical methods for estimating heritability in wild populations. Mol Ecol 20:1324–1332

    Article  PubMed  Google Scholar 

  • Spindel J, Begum H, Akdemir D, Virk P, Collard B, Redoña E, Atlin G, Jannink JL, McCouch SR (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS. Genet 11:e1004982

    Article  PubMed  PubMed Central  Google Scholar 

  • Spindel JE, Begum H, Akdemir D, Collard B, Redona E, Jannink J-L, McCouch S (2016) Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116:395–408

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Usai MG, Goddard ME, Hayes BJ (2009) LASSO with cross-validation for genomic selection. Genet Res 91:427–436

    Article  CAS  Google Scholar 

  • Van der Werf JHJ, de Boer IJM (1990) Estimation of additive genetic variance when base populations are selected. J Anim Sci 68:3124–3132

    Article  PubMed  Google Scholar 

  • Xiao SH, Zhang HP, You GX, Zhang XY, Yan CS, Chen X (2012) Integration of marker-assisted selection for resistance to pre-harvest sprouting with selection for grain-filling rate in breeding of white-kernelled wheat for the Chinese environment. Euphytica 188:85–88

    Article  Google Scholar 

  • Xu S, Zhu D, Zhang Q (2014) Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci U S A 111:12456–12461

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yu X, Li X, Guo T, Zhu C, Wu Y, Mitchell SE, Roozeboom KL, Wang D, Wang ML, Pederson GA, Tesso TT, Schnable PS, Bernardo R, Yu J (2016) Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat Plants. doi:10.1038/NPLANTS.2016.150

    Google Scholar 

  • Zhang Z, Liu J, Ding X, Bijma P, de Koning D-J, Zhang Q (2010) Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS ONE 5:e12648

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor and the two anonymous reviewers for their constructive comments, and Peter Bommert for providing information for the maize IBM population.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hailan Liu or Guo-Bo Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Communicated by Mikko J. Sillanpaa.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 33 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Chen, GB. A fast genomic selection approach for large genomic data. Theor Appl Genet 130, 1277–1284 (2017). https://doi.org/10.1007/s00122-017-2887-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-017-2887-3

Keywords

Navigation