Skip to main content
Log in

Maximization of minority classes in core collections designed for association studies

  • Original Article
  • Published:
Tree Genetics & Genomes Aims and scope Submit manuscript

Abstract

Core collections are nowadays widely employed in diverse studies on plant genetics. The more extensively used method to build core collections (maximization strategy) is based on the selection, from a global collection, of those accessions which maximize the number of different alleles and phenotypic classes (classes’ richness). However, different core collections should be created for different types of studies, and though several years ago most of core collections were developed to make the characterization and use of germplasm collections easier with a smaller sample size, for either conservation or breeding purposes, today, they are widely employed for association studies that are broadly applied in plant genetic improvement. Following the M strategy, some alleles or phenotypic classes often appear in a very low frequency, which may reduce the power of the analysis, avoiding the detection of real associations (false negatives). In this work, we propose and evaluate a new way to build core collections using the maximization strategy in several sequential steps, to maximize the frequency of minority classes, thus increasing the statistical power of the association study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y et al (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631. doi:10.1038/nature08800

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bacilieri R, Lacombe T, Le Cunff L, Vecchi-Staraz MD, Laucou V, Genna B, Péros J-P, This P, Boursiquot J-M (2013) Genetic structure in cultivated grapevines is linked to geography and human selection. BMC Plant Biol 13. doi: 10.1186/1471-2229-13-25

  • Bataillon TM, David JL, Schoen DJ (1996) Neutral genetic markers and conservation genetics: simulated germplasm collections. Genetics 144:409–417

    CAS  PubMed  PubMed Central  Google Scholar 

  • Bordes J, Ravel C, Jaubertie JP, Duperrier B, Gardet O, Heumez E, Pissavy AL, Charmet G, Gouis JL, Balfourier F (2013) Genomic regions associated with the nitrogen limitation response revealed in a global wheat core collection. Theor Appl Genet 126:805–822

    Article  CAS  PubMed  Google Scholar 

  • Brown AHD (1989) Core collections—a practical approach to genetic-resources management. Genome 31:818–824

    Article  Google Scholar 

  • Carpio DPD, Basnet RK, Vos RCHD, Maliepaard C, João M, Paulo BG (2011) Comparative methods for association studies: a case study on metabolite variation in a Brassica rapa core collection. PLoS One 6, e19624. doi:10.1371/journal.pone.0019624

    Article  Google Scholar 

  • Escribano P, Viruel MA, Hormaza JI (2008) Comparison of different methods to construct a core germplasm collection in woody perennial species with simple sequence repeats markers. A case study in cherimoya (Annona cherimola, Annonaceae) an underutilised subtropical fruit tree species. Ann Appl Biol. doi:10.1111/j.1744-7348.2008.00232.x

    Google Scholar 

  • Evanno G, Regnaut S, Goudet J (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol 14:2611–2620

    Article  CAS  PubMed  Google Scholar 

  • Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587

    CAS  PubMed  PubMed Central  Google Scholar 

  • Fernandez L, Le Cunff L, Tello J, Lacombe T, Boursiquot JM, Fournier-Level A, Bravo G, Lalet S, Torregrosa L, This P, Martinez-Zapater JM (2014) Haplotype diversity of VvTFL1A gene and association with cluster traits in grapevine (V. vinifera). BMC Plant Biol 14. doi: 10.1186/s12870-014-0209-3

  • Franco J, Crossa J, Taba S, Shands H (2005) A sampling strategy for conserving genetic diversity when forming core subsets. Crop Sci 45:1035–1044. doi:10.2135/cropsci2004.0292

    Article  Google Scholar 

  • Franco J, Crossa J, Warburton ML, Taba S (2006) Sampling strategies for conserving maize diversity when forming core subsets using genetic markers. Crop Sci 46:854–864

    Article  Google Scholar 

  • Frankel OH (1984) Genetic perspectives of germplasm conservation. Genetic manipulation: impact on man and society. Cambridge University Press, Cambridge, pp 161–170

    Google Scholar 

  • Gonzalez-Martinez SC, Huber D, Ersoz E, Davis JM, Neale DB (2008) Association genetics in Pinus taeda L. II. Carbon isotope discrimination. Heredity 101:19–26

    Article  CAS  PubMed  Google Scholar 

  • Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL (2001) MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered 92:93–94

    Article  CAS  PubMed  Google Scholar 

  • Hardy OJ, Vekemans X (2002) SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes 2:618–620

    Article  Google Scholar 

  • Holbrook CC, Anderson WF (1995) Evaluation of a core collection to identify resistance to late leafspot in peanut. Crop Sci 35:1700–1702

    Article  Google Scholar 

  • Ibáñez J, Vargas AM, Palancar M, Borrego J, de Andrés MT (2009) Genetic relationships among table-grape varieties. Am J Enol Vitic 60:35–42

    Google Scholar 

  • Khan M, Korban S (2012) Association mapping in forest trees and fruit crops. J Exp Bot 63:4045–4060. doi:10.1093/jxb/ers105

    Article  CAS  PubMed  Google Scholar 

  • Kim K-W, Chung H-K, Cho G-T, Ma K-H, Chandrabalan D, Gwag J-G, Kim T-S, Cho E-G, Park Y-J (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics 23:2155–2162

    Article  CAS  PubMed  Google Scholar 

  • Kwon S-J, Brown AF, Hu J, McGee R, Watt C, Kisha T, Timmerman-Vaughan G, Grusak M, McPhee KE, Coyne CJ (2012) Genetic diversity, population structure and genome-wide marker-trait association analysis emphasizing seed nutrients of the USDA pea (Pisum sativum L.) core collection. Genes & Genomics 34:305–320

    Article  CAS  Google Scholar 

  • Le Cunff L, Fournier-Level A, Laucou V, Vezzuli S, Lacombe T, Adam-Blondon A, Boursiquot JM, This P (2008) Construction of nested genetic core collections to optimize the exploitation of natural diversity in Vitis vinifera L. subsp sativa. BMC Plant Biol 8. doi: 10.1186/1471-2229-8-31

  • Li X, Yan W, Agrama H, Jia L, Shen X, Jackson A, Moldenhauer K, Yeater K, McClung A, Wu D (2011) Mapping QTLs for improving grain yield using the USDA rice mini-core collection. Planta 234:347–361

    Article  CAS  PubMed  Google Scholar 

  • Liu X, Huang M, Fan B, Buckler ES, Zhang Z (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet 12(2), e1005767. doi:10.1371/journal.pgen.1005767

    Article  PubMed  PubMed Central  Google Scholar 

  • McKhann HI, Camilleri C, Bérard A, Bataillon TM, David JL, Reboud X, Le Corre V, Caloustian C, Gut IG, Brunel D (2004) Nested core collections maximizing genetic diversity in Arabidopsis thaliana. Plant J 38:193–202

    Article  CAS  PubMed  Google Scholar 

  • Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang ZW, Costich DE, Buckler ES (2009) Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21:2194–2202

  • Odong TL, Jansen J, van Eeuwijk FA, van Hintum TJL (2013) Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theor Appl Genet126. doi: 10.1007/s00122-012-1971-y

  • Pritchard JK, Stephens M, Donnely P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959

    CAS  PubMed  PubMed Central  Google Scholar 

  • Ritland K (1996) Estimators for pairwise relatedness and individual inbreeding coefficients. Genet Res 67:175–185

    Article  Google Scholar 

  • Ronfort J, Bataillon T, Santoni S, Delalande M, David JL, Prosperi J-M (2006) Microsatellite diversity and abroad scale geographic structure in a model legume: building a set of nested core collection for studying naturally occurring variation in Medicago truncatula. BMC Plant Biol 6:28–40

    Article  PubMed  PubMed Central  Google Scholar 

  • Schoen DJ, Brown AHD (1993) Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers. Proc Natl Acad Sci 90:10623–10627

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shin J, Lee C (2015) Statistical power for identifying nucleotide markers associated with quantitative traits in genome-wide association analysis using a mixed model. Genomics 105:1–4

    Article  CAS  PubMed  Google Scholar 

  • Soto-Cerda BJ, Diederichsen A, Ragupathy R, Cloutier S (2013) Genetic characterization of a core collection of flax (Linum usitatissimum L.) suitable for association mapping studies and evidence of divergent selection between fiber and linseed types. BMC Plant Biol 13. doi: 10.1186/1471-2229-13-78

  • Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A, Cloutier S (2014) Association mapping of seed quality traits using the Canadian flax (Linum usitatissimum L.) core collection. Theor Appl Genet 127:881–896. doi:10.1007/s00122-014-2264-4

    Article  PubMed  PubMed Central  Google Scholar 

  • Upadhyaya H, Wang Y, Sharma S, Singh S (2012) Association mapping of height and maturity across five environments using the sorghum mini core collection. Genome 55:471–479. doi:10.1139/g2012-034

    Article  CAS  PubMed  Google Scholar 

  • Upadhyaya H, Wang Y, Gowda C, Sharma S (2013) Association mapping of maturity and plant height using SNP markers with the sorghum mini core collection. Theor Appl Genet 126:2003–2015. doi:10.1007/s00122-013-2113-x

    Article  CAS  PubMed  Google Scholar 

  • van Hintum TJL, Brown AHD, Spillane C, Hodgkin T (2003) Colecciones núcleo de recursos fitogenéticos. Boletín Técnico del IPGRI 3

  • Vargas A, Fajardo M, Borrego J, de Andrés MT, Ibáñez J (2013a) Polymorphisms in VvPel associate with variation in berry texture and bunch size in the grapevine. Aust J Grape Wine Res 19:193–207. doi:10.1111/ajgw.12029

    Article  CAS  Google Scholar 

  • Vargas AM, Le Cunff L, This P, Ibáñez J, de Andrés MT (2013b) VvGAI1 polymorphisms associate with variation for berry traits in grapevine. Euphytica 191:85–98. doi:10.1007/s10681-013-0866-6

    Article  CAS  Google Scholar 

  • Wang ML, Sukumaran S, Barkley NA, Chen Z, Chen CY, Guo B, Pittman RN, Stalker HT, Holbrook CC, Pederson GA, Yu J (2011) Population structure and marker–trait association analysis of the US peanut (Arachis hypogaea L.) mini-core collection. Theor Appl Genet 123:1307–1317

    Article  PubMed  Google Scholar 

  • Weir BS (2010) Statistical genetic issues for genome-wide association studies. Genome 53(11):869–875

    Article  PubMed  PubMed Central  Google Scholar 

  • Whitt S, Buckler E (2003) Using natural allelic diversity to evaluate gene function. Methods Mol Biol 236:123–140

    CAS  PubMed  Google Scholar 

  • Zhang Z, Ersoz E, Lai C-Q, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42(4):355–360. doi:10.1038/ng.546

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhao J, Artemyeva A, Del Carpio D, Basnet R, Zhang N, Gao J, Li F, Bucher J, Wang X, Visser R, Bonnema G (2010) Design of a Brassica rapa core collection for association mapping studies. Genome 53:884–898. doi:10.1139/G10-082

    Article  CAS  PubMed  Google Scholar 

  • Zorić M, Dodig D, Kobiljski B, Quarrie S, Barnes J (2012) Population structure in a wheat core collection and genomic loci associated with yield under contrasting environments. Genetica 140:259–275

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

This study was made possible with the funding from the GrapeGen project (joint venture between Genome Canada and Genoma España) and the AGL2010-15694 (MINECO) and AGL2014-59171-R (MINECO, FEDER). A.M. Vargas was funded by a predoctoral fellowship from Instituto Madrileño de Investigación y Desarrollo Rural, Agrario y Alimentario. Thanks to Rafael Torres-Pérez for helping with the structure analysis and to Joaquín Borrego, Carmen Fajardo, Carlos González Guillén, Mª Dolores Vélez, Silvia Hernáiz, Paz Fernández, Nuria Rodríguez Jiménez, and Concepción López Rivas for their technical assistance in the grapevine morphological descriptions.

Data archiving statement

The phenotypic dataset for 1000 rice accessions was downloaded from http://www.genebank.go.kr/eng/PowerCore/ (Kim et al. 2007), and it is included in the supplementary material. The grapevine dataset is also included in the supplementary material, with a list of the accessions used. Additional information about the accessions of IMIDRA grapevine collection (ESP080) can be found in the website http://www.madrid.org/coleccionvidencin/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Ibáñez.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interests.

Research involving human participants and/or animals

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by D. Grattapaglia

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1

Global data and core collections obtained for rice (Kim et al. 2007). (XLSX 624 kb)

Online Resource 2

Global data and core collections obtained for grapevine (XLSX 364 kb)

Online Resource 3

Simulation of SNP bi-allelic data correlated with class 9 of the trait “Time of Budburst”. (XLSX 28 kb)

Online Resource 4

Frequency thresholds used for the selection of accessions for the kernel file in the Method 3. (XLSX 10 kb)

Online Resource 5

Population stratification in the different collections using the software Structure and the Evanno correction (ΔK). a) ΔK for each collection; b) Structure plot of estimates of membership coefficient (Q) for the run with the highest LnP(D) in each collection. Each accession is represented by a single vertical line, which represents Q coefficients, proportionally in different colors for the different populations. (PPTX 162 kb)

Online Resource 6

Population assignment of each accession in the different core collections considering a membership coefficient threshold of 0.8. (XLSX 22 kb)

Online Resource 7

Distribution of accessions belonging to each population in the different collections. (XLSX 9 kb)

Online Resource 8

Simulated genotypic data for association analysis with “Time of Budburst”. (XLSX 26 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vargas, A.M., de Andrés, M.T. & Ibáñez, J. Maximization of minority classes in core collections designed for association studies. Tree Genetics & Genomes 12, 28 (2016). https://doi.org/10.1007/s11295-016-0988-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11295-016-0988-9

Keywords

Navigation