Theoretical and Applied Genetics

, Volume 132, Issue 4, pp 1211–1222 | Cite as

Efficient genetic value prediction using incomplete omics data

  • Matthias Westhues
  • Claas Heuer
  • Georg Thaller
  • Rohan Fernando
  • Albrecht E. MelchingerEmail author
Original Article


Key message

Covering a subset of individuals with a quantitative predictor, while imputing records for all others using pedigree or genomic data, could improve the precision of predictions while controlling for costs.


Predicting genetic values with high accuracy is pivotal for effective candidate selection in animal and plant breeding. Novel ‘omics’-based predictors have been shown to improve upon established genome-based predictions of important complex traits but require laborious and expensive assays. As a consequence, there are various datasets with full genetic marker coverage of all studied individuals but incomplete coverage with other ‘omics’ data. In animal breeding, single-step prediction was introduced to efficiently combine pedigree information, collected on a large number of animals, with genomic information, collected on a smaller subset of animals, for breeding value estimation without bias. Using two maize datasets of inbred lines and hybrids, we show that the single-step framework facilitates imputing transcriptomic data, boosting forecasts when their predictive ability exceeds that of pedigree or genomic data. Our results suggest that covering only a subset of inbred lines with ‘omics’ predictors and imputing all others using pedigree or genomic data could enable breeders to improve trait predictions while keeping costs under control. Employing ‘omics’ predictors could particularly improve candidate selection in hybrid breeding because the success of forecasts is a strongly convex function of predictive ability.



We thank T. A. Schrag from the University of Hohenheim for providing the phenotypic data as well as S. Scholten, A. Thiemann and F. Seifert from the University of Hamburg for providing the gene expression data for Experiment 2, respectively. Furthermore, we would like to thank researchers and institutions who contributed to the development of the maize diversity panel and associated data from Experiment 1, in particular Jianbing Yan and Haijun Liu from Huazhong Agricultural University Wuhan in China. We thank T. A. Schrag and W. Molenaar for valuable suggestions for improving the content of this manuscript. The authors acknowledge support by the state of Baden-Württemberg through bwHPC. Financial support for M. W. was provided by the Fiat Panis foundation, Ulm, Germany.

Author contribution statement

MW and CH conceived the study. AEM, RF and GT guided the structure of the research and checked the methodology and results for validity. MW and CH drafted the manuscript. MW and CH implemented the prediction models and developed software. MW analyzed the data. All authors interpreted the results, read and approved the final version of the manuscript.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ (2010) Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci 93(2):743–52. CrossRefGoogle Scholar
  2. Ashraf B, Edriss V, Akdemir D, Autrique E, Bonnett D, Crossa J, Janss L, Singh R, Jannink JL (2016) Genomic prediction using phenotypes from pedigreed lines with no marker data. Crop Sci 56(3):957–964. CrossRefGoogle Scholar
  3. Brem RB, Storey JD, Whittle J, Kruglyak L (2005) Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436(7051):701–3. CrossRefGoogle Scholar
  4. Brown AA, Buil A, Vinuela A, Lappalainen T, Zheng HF, Richards JB, Small KS, Spector TD, Dermitzakis ET, Durbin R (2014) Genetic interactions affecting human gene expression identified by variance association mapping. eLife 2014(3):1–16. Google Scholar
  5. Browning BL, Browning SR (2009) A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet 84(2):210–223. CrossRefGoogle Scholar
  6. de los Campos G, Vazquez AI, Fernando R, Klimentidis YC, Sorensen D (2013) Prediction of complex human traits using the genomic best linear unbiased predictor. PLoS Genet 9(7):1–15. CrossRefGoogle Scholar
  7. Canty A, Ripley BD (2017) Boot: bootstrap R (S-Plus) functionGoogle Scholar
  8. Christensen OF (2012) Compatibility of pedigree-based and marker-based relationship matrices for single-step genetic evaluation. Genet Sel Evol 44:37. CrossRefGoogle Scholar
  9. Christensen OF, Lund MS (2010) Genomic prediction when some animals are not genotyped. Genet Sel Evol 42:2. CrossRefGoogle Scholar
  10. Dan Z, Hu J, Zhou W, Yao G, Zhu R, Zhu Y, Huang W (2016) Metabolic prediction of important agronomic traits in hybrid rice (Oryza sativa L.). Nature Sci Rep 6:1–9. Google Scholar
  11. Dey KK, Hsiao CJ, Stephens M (2016) Clustering RNA-seq expression data using grade of membership models.
  12. Fernando RL, Dekkers JC, Garrick DJ (2014) A class of Bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genet Sel Evol 46(1):50. CrossRefGoogle Scholar
  13. Fragomeni BO, Lourenco DAL, Tsuruta S, Masuda Y, Aguilar I, Legarra A, Lawlor TJ, Misztal I (2015) Hot topic: use of genomic recursions in single-step genomic best linear unbiased predictor (BLUP) with a large number of genotypes. J Dairy Sci 98(6):4090–4094. CrossRefGoogle Scholar
  14. Fu J, Cheng Y, Linghu J, Yang X, Kang L, Zhang Z, Zhang J, He C, Du X, Peng Z, Wang B, Zhai L, Dai C, Xu J, Wang W, Li X, Zheng J, Chen L, Luo L, Liu J, Qian X, Yan J, Wang J, Wang G (2013) RNA sequencing reveals the complex regulatory network in the maize kernel. Nat Commun 4:2832. CrossRefGoogle Scholar
  15. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Nicolae DL, Cox NJ, Im HK (2015) A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47(9):1091–1098. CrossRefGoogle Scholar
  16. Ganal MW, Durstewitz G, Polley A, Bérard A, Buckler ES, Charcosset A, Clarke JD, Graner EM, Hansen M, Joets J, Le Paslier MC, McMullen MD, Montalent P, Rose M, Schön CC, Sun Q, Walter H, Martin OC, Falque M (2011) A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PloS ONE 6(12):e28-334. CrossRefGoogle Scholar
  17. García-Ruiz A, Cole JB, VanRaden PM, Wiggans GR, Ruiz-López FJ, Tassell CPV (2016) Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci USA 113(33):201519,061. Google Scholar
  18. Garrick DJ, Taylor JF, Fernando RL (2009) Deregressing estimated breeding values and weighting information for genomic regression analyses. Genet Sel Evol 41(1):55. CrossRefGoogle Scholar
  19. Guo Z, Magwire MM, Basten CJ, Xu Z, Wang D (2016) Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize. Theor Appl Genet 129(12):2413–2427. CrossRefGoogle Scholar
  20. Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177(4):2389–97. CrossRefGoogle Scholar
  21. Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Genomic selection in dairy cattle: progress and challenges. Dairy Sci 92(2):433–43. CrossRefGoogle Scholar
  22. Jiang Y, Reif JC (2015) Modelling epistasis in genomic selection. Genetics 201(2):759–768. CrossRefGoogle Scholar
  23. Kadam D, Potts S, Bohn MO, Lipka AE, Lorenz A (2016) Genomic prediction of hybrid combinations in the early stages of a maize hybrid breeding pipeline. G3 6:3443–3453. CrossRefGoogle Scholar
  24. Legarra A, Aguilar I, Misztal I (2009) A relationship matrix including full pedigree and genomic information. J Dairy Sci 92(9):4656–4663. CrossRefGoogle Scholar
  25. Legarra A, Christensen OF, Aguilar I, Misztal I (2014) Single step, a general approach for genomic selection. Livestock Sci 166(1):54–65. CrossRefGoogle Scholar
  26. Lourenco DAL, Tsuruta S, Fragomeni B, Masuda Y, Aguilar I, Legarra A, Bertrand J, Amen T, Wang L, Moser D, Misztal I (2015) Genetic evaluation using single-step genomic best linear unbiased predictor in American Angus. J Anim Sci 93:2653–2662. CrossRefGoogle Scholar
  27. Martini JWR, Wimmer V, Erbe M, Simianer H (2016) Epistasis and covariance: how gene interaction translates into genomic relationship. Theor Appl Genet 129(5):963–976. CrossRefGoogle Scholar
  28. Masuda Y, Misztal I, Tsuruta S, Legarra A, Aguilar I, Lourenco D, Fragomeni B, Lawlor T (2016) Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals. J Dairy Sci 99(3):1968–1974. CrossRefGoogle Scholar
  29. Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829Google Scholar
  30. Mrode RA (2014) Linear Models for the Prediction of Animal Breeding Values, 3rd edn. CABI, Oxfordshire,
  31. Pérez P, de Los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198(October):483–495. CrossRefGoogle Scholar
  32. Pérez-Enciso M, Rincón JC, Legarra A (2015) Sequence- vs. chip-assisted genomic selection: accurate biological information is advised. Genet Sel Evol 47(1):43. CrossRefGoogle Scholar
  33. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959. Google Scholar
  34. Ratcliffe B, Gamal El-Dien O, Cappa EP, Porth I, Klapste J, Chen C, El-Kassaby Y (2017) Single-step BLUP with varying genotyping effort in open-pollinated picea glauca. G3 7:935–942. CrossRefGoogle Scholar
  35. Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A, Smyth GK (2007) A comparison of background correction methods for two-colour microarrays. Bioinformatics 23(20):2700–2707. CrossRefGoogle Scholar
  36. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. CrossRefGoogle Scholar
  37. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  38. Sackton TB, Hartl DL (2016) Perspective genotypic context and epistasis in individuals and populations. Cell 166:279–287. CrossRefGoogle Scholar
  39. Schrag TA, Westhues M, Schipprack W, Seifert F, Thiemann A, Scholten S, Melchinger AE (2018) Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics
  40. Smyth GK, Speed T (2003) Normalization of cDNA microarray data. Methods 31(4):265–273. CrossRefGoogle Scholar
  41. Technow F, Riedelsheimer C, Ta Schrag, Melchinger AE (2012) Genomic prediction of hybrid performance in maize with models incorporating dominance and population specific marker effects. Theor Appl Genet 125(6):1181–94. CrossRefGoogle Scholar
  42. Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melchinger AE (2014) Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics 197:1343–1355. CrossRefGoogle Scholar
  43. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91(11):4414–4423. CrossRefGoogle Scholar
  44. VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, Schenkel FS (2009) Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92(1):16–24. CrossRefGoogle Scholar
  45. Vazquez AI, Veturi YC, Behring M, Shrestha S, Kirst M, Resende MF Jr, de los Campos G (2016) Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multi-omic profiles. Genetics 203(3):1425–1438. CrossRefGoogle Scholar
  46. Vitezica ZG, Aguilar I, Misztal I, Legarra A (2011) Bias in genomic predictions for populations under selection. Genetics Res 93(5):357–66. CrossRefGoogle Scholar
  47. Watson A, Ghosh S, Williams MJ, Cuddy W, Simmonds J, Rey MD, Md Hatta MA, Hinchliffe A, Steed A, Reynolds D, Adamski N, Breakspear A, Korolev A, Rayner T, Dixon LE, Riaz A, Martin W, Ryan M, Edwards D, Hickey L (2018) Speed breeding is a powerful tool to accelerate crop research and breeding. Nat Plants 4:23–29CrossRefGoogle Scholar
  48. Wedzony M, Forster B, Zur I, Golemiec E, Scechynska-Hebda M, Dubas E, Gotebiowska G (2009) Progress in doubled haploid technology in higher plants. In: Touarev A, Forster BP, Mohan JS (eds) Advances in haploid production in higher plants, chap 1. Springer, New YorkGoogle Scholar
  49. Westhues M, Schrag TA, Heuer C, Utz HF, Schipprack W, Seifert F, Ehret A, Schlereth A, Stitt M, Nikoloski Z, Willmitzer L, Schön CC, Melchinger AE (2017) Omics-based hybrid prediction in maize. Theor Appl Genet 130:1927–1939. CrossRefGoogle Scholar
  50. Windhausen VS, Atlin GN, Hickey JM, Crossa J, Jannink JL, Sorrells ME, Raman B, Cairns JE, Tarekegne A, Semagn K, Beyene Y, Grudloyma P, Technow F, Riedelsheimer C, Melchinger AE (2012) Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3 (Bethesda, Md) 2(11):1427–36. CrossRefGoogle Scholar
  51. Xiang T, Nielsen B, Su G, Legarra A, Christensen OF (2016) Application of single-step genomic evaluation for crossbred performance in pig. J Anim Sci 94(3):936–948. CrossRefGoogle Scholar
  52. Xu S, Xu Y, Gong L, Zhang Q (2016) Metabolomic prediction of yield in hybrid rice. Plant J 88(2):219–227. CrossRefGoogle Scholar
  53. Yang N, Lu Y, Yang X, Huang J, Zhou Y, Ali F, Wen W, Liu J, Li J, Yan J (2014) Genome wide association studies using a new nonparametric model reveal the genetic architecture of 17 agronomic traits in an enlarged maize association panel. PLoS Genet 10(9):1–2. CrossRefGoogle Scholar
  54. Zenke-Philippi C, Frisch M, Thiemann A, Seifert F, Schrag TA, Melchinger AE, Scholten S, Herzog E (2017) Transcriptome-based prediction of hybrid performance with unbalanced data from a maize breeding programme. Plant Breed 136:331–337. CrossRefGoogle Scholar
  55. Zhao Y, Mette MF, Reif JC (2015) Genomic selection in hybrid breeding. Plant Breed 134(1):1–10. CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of Plant Breeding, Seed Science and Population GeneticsUniversity of HohenheimStuttgartGermany
  2. 2.Institute of Animal Breeding and HusbandryChristian-Albrechts-University KielKielGermany
  3. 3.Inguran, LLC dba STGeneticsNavasotaUSA
  4. 4.Department of Animal ScienceIowa State UniversityAmesUSA

Personalised recommendations