Abstract
The snpReady R package is a new instrument developed to help breeders in genomic projects such as genomic prediction and association studies. This package offers three different methods to build the genomic relationship matrix, a new imputation method for missing markers based on Wright’s theory, and a population genetic overview. Therefore, we implemented three functions (raw.data, G.matrix, and popgen). Hence, this tool allows the raw data to be transformed from different genotyping platforms to numeric matrices and performs quality control (missing data and allele frequency). Moreover, the package generates and exports four different relationship matrices (proposed by Yang et al. (N 569:565–569, 2010), VanRaden (JDS 91:4414–23, 2008), and the Gaussian kernel) depending on the purpose and software to be used in further analysis. Finally, based on the genotypic matrix, the package estimates the genetic variability, effective population size, and endogamy, among other population genetic parameters. Empirical comparisons between the method of imputation proposed and other well-known approaches have shown a lower accuracy of imputation, however, with no significant impact on the genome prediction accuracies when a lower amount of missing data is allowed. The functions and arguments were designed to carry out the preparation of genomic datasets in a straightforward, fast, and more computationally efficient way.
The package and its details are available at CRAN or http://www.github.com/italo-granato/snpReady.
Similar content being viewed by others
References
Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Hum Genet 98:116–126. https://doi.org/10.1016/j.ajhg.2015.11.020
Buermans HPJ, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochim Biophys Acta—Mol Basis Dis 1842:1932–1941. https://doi.org/10.1016/j.bbadis.2014.06.015
Cooper TA, Wiggans GR, VanRaden PM (2013) Short communication: relationship of call rate and accuracy of single nucleotide polymorphism genotypes in dairy cattle1. J Dairy Sci 96:3336–3339. https://doi.org/10.3168/jds.2012-6208
Da Y, Wang C, Wang S, Hu G (2014) Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers. PLoS One 9:e87666. https://doi.org/10.1371/journal.pone.0087666
Edwards D, Batley J (2010) Plant genome sequencing: applications for crop improvement. Plant Biotechnol J 8:2–9. https://doi.org/10.1111/j.1467-7652.2009.00459.x
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255. https://doi.org/10.3835/plantgenome2011.08.0024
Forni S, Aguilar I, Misztal I (2011) Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genet Sel Evol 43:1–7. https://doi.org/10.1186/1297-9686-43-1
Gianola D, Weigel K, Krämer N et al (2014) Enhancing genome-enabled prediction by bagging genomic BLUP. PLoS One 9:e91693. https://doi.org/10.1371/journal.pone.0091693
Hastie T, Tibshirani R, Narasimhan B, Chu G (2017) impute: Imputation for microarray data
He S, Zhao Y, Mette M, Bothe R, Ebmeyer E, Sharbel TF, Reif JC, Jiang Y (2015) Prospects and limits of marker imputation in quantitative genetic studies in European elite wheat (Triticum aestivum L.). BMC Genomics 16:168. https://doi.org/10.1186/s12864-015-1366-y
Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000529. https://doi.org/10.1371/journal.pgen.1000529
Keller MC, Visscher PM, Goddard ME (2011) Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data. Genetics 189:237–249. https://doi.org/10.1534/genetics.111.130922
de los Campos G, Gianola D, Rosa GJ et al (2010) Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genet Res (Camb) 92:295–308. https://doi.org/10.1017/S0016672310000285
Pérez P, de los Campos G (2014) Genome-wide regression & prediction with the BGLR statistical package. Genetics 198:483–495. https://doi.org/10.1534/genetics.114.164442
Pérez-Elizalde S, Cuevas J, Pérez-Rodríguez P, Crossa J (2015) Selection of the bandwidth parameter in a Bayesian kernel regression model for genomic-enabled prediction. J Agric Biol Environ Stat 20:512–532. https://doi.org/10.1007/s13253-015-0229-y
Powell JE, Visscher PM, Goddard ME (2010) Reconciling the analysis of IBD and IBS in complex trait studies. Nat Rev Genet 11:800–805. https://doi.org/10.1038/nrg2865
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945 LP-959. doi: https://doi.org/10.1111/j.1471-8286.2007.01758.x, 578
Rutkoski JE, Poland J, Jannink J-L, Sorrells ME (2013) Imputation of unordered markers and the impact on genomic selection accuracy. G3-Genes Genom Genet 3:427–439. https://doi.org/10.1534/g3.112.005363
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17:520–525
Unterseer S, Bauer E, Haberer G, Seidel M, Knaak C, Ouzunova M, Meitinger T, Strom TM, Fries R, Pausch H, Bertani C, Davassi A, Mayer KFX, Schön CC (2014) A powerful tool for genome analysis in maize: development and evaluation of the high density 600 k SNP genotyping array. BMC Genomics 15:823. https://doi.org/10.1186/1471-2164-15-823
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423. https://doi.org/10.3168/jds.2007-0980
Wimmer V, Albrecht T, Auinger H-J, Schön C-C (2012) synbreed: a framework for the analysis of genomic prediction data using R. Bioinformatics 28:2086–2087. https://doi.org/10.1093/bioinformatics/bts335
Wright S (1922) Coefficients of inbreeding and relationship. Am Nat 56:330–338. https://doi.org/10.1086/279872
Yang J, Benyamin B, Mcevoy BP et al (2010) Common SNPs explain a large proportion of heritability for human height. Nature 569:565–569. https://doi.org/10.1038/ng.608
Funding
The authors thank the National Council for Scientific and Technological Development (CNPq) and Coordination for the Improvement of Higher Level Personnel (CAPES) for granting scholarships that allowed the creation of this tool.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interests.
Rights and permissions
About this article
Cite this article
Granato, I.S.C., Galli, G., de Oliveira Couto, E.G. et al. snpReady: a tool to assist breeders in genomic analysis. Mol Breeding 38, 102 (2018). https://doi.org/10.1007/s11032-018-0844-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11032-018-0844-8