Recent advances in genetic technologies have given researchers the ability to characterize genetic marker data for large germplasm collections. While some studies are able to capitalize on entire germplasm collections, others, especially those that focus on traits that are difficult to phenotype, instead focus on a subset of the collection. Typically, subsets are selected using phenotypic or geographic data. One major hurdle in identifying favorable subsets is selecting a criterion that can be used to quantify the value of a subset. This study compares two such criteria, polymorphism information content, and a new criterion based on kinship matrices, which will be called the mean of transformed kinships. These criteria were explored in terms of their ability to select subsets that are favorable for genome wide association studies, and in their ability to select subsets that contain a high number of rare phenotypes. Using phenotypic and genotypic data that has been amassed from the USDA Barley Core Collection, evidence was found to support the hypotheses that subsets based on the mean of transformed kinships were well-suited to select subsets intended for genome-wide association studies, but the same was not found for polymorphism information content. Inversely, evidence was found to support the hypothesis that subsets based on polymorphism information content were well-suited to select subsets intended for rare-phenotype discovery, but the same was not found for subsets selected using the mean of transformed kinships criterion. Tools to select subsets using these two criteria have been released in the R package “GeneticSubsetter.”
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4:250–255
Holbrook CC, Anderson WF, Pittman RN (1993) Selection of a core collection from the U.S. germplasm collection of peanut. Crop Sci 33:859–861
Liu K, Muse SV (2005) PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21:2128–2129
Mahajan RK, Bisht IS, Agrawal RC, Rana RS (1996) Studies on South Asian okra collection: methodology for establishing a representative core set using characterization data. Genet Resour Crop Evol 43:249–255
Muñoz-Amatraín M, Cuesta-Marcos A, Endelman JB, Comadran J, Bonman JM (2014) The USDA barley core collection: genetic diversity, population structure, and potential for genome-wide association studies. PLoS One 9:e94688
Smith JSC, Chin ECL, Shu H, Smith OS, Wall SJ et al (1997) An evaluation of the utility of SSR loci as molecular markers in maize (Zea mays L.): comparisons with data from RFLPS and pedigree. Theor Appl Genet 95:163–173
Upadhyaya HD, Bramel PJ, Singh S (2001) Development of a chickpea core subset using geographic distribution and quantitative traits. Crop Sci 41:206–210
Upadhyaya HD, Pundir RPS, Dwivedi SL, Gowda CLL, Reddy VG, Singh S (2009) Developing a mini core collection of sorghum for diversified utilization of germplasm. Crop Sci 49:1769–1780
Zewdie Y, Tong N, Bosland P (2004) Establishing a core collection of Capsicum using a cluster analysis with enlightened selection of accessions. Genet Resour Crop Evol 51:147–151
We would like to thank Professor Jennifer Kling and Dr. WTB Thomas for their guidance in developing this project.
Rights and permissions
About this article
Cite this article
Graebner, R.C., Hayes, P.M., Hagerty, C.H. et al. A comparison of polymorphism information content and mean of transformed kinships as criteria for selecting informative subsets of barley (Hordeum vulgare L. s. l.) from the USDA Barley Core Collection. Genet Resour Crop Evol 63, 477–482 (2016). https://doi.org/10.1007/s10722-015-0265-z
- Hordeum vulgare
- Rare phenotypes