Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins
SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821–0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825–0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.
KeywordsHolstein Imputation Genomic prediction Low-density SNP chips
Analysis of variance
Daughter pregnancy rate
Genomic-estimated breeding value
Genotype (imputation) error rate
Genomic prediction accuracy
Loss in genomic prediction accuracy
Minor allele frequencies
Markov chain Monte Carlo
Predicted transmitting abilities
Relative genomic prediction accuracy
Relative total maximum gap length
Total maximum gap length
JH and JX analyzed the data. JH and XW drafted the manuscript. XW,JL,SB,GM,SK and MS participated in it’s the design and discussions of this research. All authors have proof-read and approved the final manuscript.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interests in this work.
JH, JX, and JL acknowledge the financial support by University of Nebraska–Lincoln, and GeneSeek (A Neogen company). HJ was also supported by the Bairen Plan of Hunan Province, China (XZ2016-08-07) and Hunan Co-Innovation center of Animal Production Safety, China.
- Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI-1995. Morgan Kaufmann, San Mateo. 2:pp 1137–1143Google Scholar
- Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A, Cho JH, Guttmacher AE, Kong A, Kruglyak L, Mardis E, Rotimi CN, Slatkin M, Valle D, Whittemore AS, Boehnke M, Clark AG, Eichler EE, Gibson G, Haines JL, MacKay TFC, McCarroll SA, Visscher PM (2009) Finding the missing heritability of complex diseases. Nature 461:747–753CrossRefPubMedPubMedCentralGoogle Scholar
- Wiggans GR, Sonstegard TS, VanRaden PM, Matukumalli LK, Schnabel RD, Taylor JF, Schenkel FS, Van Tassell CP (2009) Selection of single-nucleotide polymorphisms and quality of genotypes used in genomic evaluation of dairy cattle in the United States and Canada. J Dairy Sci 92:3431–3436CrossRefPubMedGoogle Scholar
- Wu XL, Hayrettin O, Duan H, Beissinger T, Bauck S, Woodward B, Rosa GJ, Weigel KA, de Leon Gatti N, Taylor J, Gianola D (2012b) Parallel-BayesCpC on OSG: grid-enabled high-throughput computing for genomic selection in practice. PAG XX, San DiegoGoogle Scholar