Skip to main content
Log in

Capturing haplotypes in germplasm core collections using bioinformatics

  • Short Communication
  • Published:
Genetic Resources and Crop Evolution Aims and scope Submit manuscript

Abstract

Genomewide data sets of single nucleotide polymorphisms (SNPs) offer great potential to improve ex situ conservation. Two factors impede their use for producing core collections. First, due to the large number of SNPs, the assembly of collections that maximize diversity may be intractable using existing, serial software algorithms. Second, the effect of the natural partitioning of the genome into linked regions, or haplotype blocks, on the optimization of collections, and the capture of diversity, is unknown. To address the first problem, we report the development of a parallel computer program, M+, for identifying optimized core collections from arbitrarily large genotypic data sets on high performance computing systems. With respect to the second problem, we use three exemplar data sets to show that, as haplotype block length increases, the number of accessions necessary to capture a predetermined proportion of genomewide haplotypic variation also increases. This relationship is asymptotic such that the minimum haplotype block length suitable for assembling core collections can be empirically determined, and the number of accessions necessary to capture a given percentage of the haplotypic diversity present in the entire collection can be estimated, even when true haplotype structure is unknown. Additionally, we test whether simple geographic or environmental information can be used to produce core collections with elevated genomewide haplotypic diversity. We find this opportunity to be limited, and dependent on natural history and improvement status.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

References

  • Chia J-M, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, Elshire RJ, Gaut B, Geller L, Glaubitz JC, Gore M, Guill KE, Holland J, Hufford MB, Lai J, Li M, Liu X, Lu Y, McCombie R, Nelson R, Poland J, Prasanna BM, Pyhäjärvi T, Rong T, Sekhon RS, Sun Q, Tenaillon MI, Tian F, Wang J, Xu X, Zhang Z, Kaeppler SM, Ross-Ibarra J, McMullen MD, Buckler ES, Zhang G, Xu Y, Ware D (2012) Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet 44:803–807

    Article  CAS  PubMed  Google Scholar 

  • de Wet JMJ, Harlan JR (1971) The origin and domestication of Sorghum bicolor. Econ Bot 25:128–135

    Article  Google Scholar 

  • Flint-Garcia SA, Thornsberry JM, Buckler ES (2003) Structure of linkage disequilibrium in plants. Annu Rev Plant Biol 54:357–374

    Article  CAS  PubMed  Google Scholar 

  • Fuller DQ, Denham T, Arroyo-Kalin M, Lucas L, Stevens CJ, Qin L, Allaby RG, Purugganan MD (2014) Convergent evolution and parallelism in plant domestication revealed by an expanding archaeological record. Proc Natl Acad Sci USA 111:6147–6152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Geraldes A, Farzaneh N, Grassa CJ, McKown AD, Guy RD, Mansfield SD, Douglas CJ, Cronk QCB (2014) Landscape genomics of Populus trichocarpa: the role of hybridization, limited gene flow, and natural selection in shaping patterns of population structure. Evolution 68:3260–3280

    Article  PubMed  Google Scholar 

  • Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL (2001) MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered 92:93–94

    Article  CAS  PubMed  Google Scholar 

  • Gross BL, Volk GM, Richards CM, Reeves PA, Henk AD, Forsline PL, Szewc-McFadden A, Fazio G, Chao CT (2013) Diversity captured in the USDA-ARS national plant germplasm system apple core collection. J Am Soc Hortic Sci 138:375–381

    Google Scholar 

  • Lasky JR, Des Marais DL, McKay JK, Richards JH, Juenger TE, Keitt TH (2012) Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate. Mol Ecol 21:5512–5529

    Article  PubMed  Google Scholar 

  • Lasky JR, Upadhyaya HD, Ramu P, Deshpande S, Hash CT, Bonnette J, Juenger TE, Hyma K, Acharya C, Mitchell SE, Buckler ES, Brenton Z, Kresovich S, Morris GP (2015) Genome-environment associations in sorghum landraces predict adaptive traits. Sci Adv 1:e1400218

    Article  PubMed  PubMed Central  Google Scholar 

  • Legendre P (1993) Spatial autocorrelation: trouble or new paradigm? Ecology 74:1659–1673

    Article  Google Scholar 

  • Maxted N, Dulloo E, Ford-Lloyd BV, Iriondo JM, Jarvis A (2008) Gap analysis: a tool for complementary genetic conservation assessment. Divers Distrib 14:1018–1030

    Article  Google Scholar 

  • Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE, Harriman J, Glaubitz JC, Buckler ES, Kresovich S (2012) Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci USA 110:453–458

    Article  PubMed  PubMed Central  Google Scholar 

  • Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, Jakobsson M, Kim S, Morozov Y, Padhukasahasram B, Plagnol V, Rosenberg NA, Shah C, Wall JD, Wang J, Zhao K, Kalbfleisch T, Schulz V, Kreitman M, Bergelson J (2005) The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol 3:e196

    Article  PubMed  PubMed Central  Google Scholar 

  • Platt A, Horton M, Huang YS, Li Y, Anastasio A, Mulyati NW, Ågren J, Bossdorf O, Byers D, Donohue K, Dunning M, Holub EB, Hudson A, Le Corre V, Loudet O, Roux F, Warthmann N, Weigel D, Rivero L, Scholl R, Nordborg M, Bergelson J, Borevitz JO (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6:e1000843

    Article  PubMed  PubMed Central  Google Scholar 

  • Rafalski A (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94–100

    Article  CAS  PubMed  Google Scholar 

  • Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schoen DJ, Brown AHD (1993) Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers. Proc Natl Acad Sci USA 90:10623–10627

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schwartz R, Halldórsson BV, Bafna V, Clark AG, Istrail S (2003) Robustness of inference of haplotype block structure. J Comput Biol 10:13–19

    Article  CAS  PubMed  Google Scholar 

  • Slavov GT, DiFazio SP, Martin J, Schackwitz W, Muchero W, Rodgers-Melnick E, Lipphardt MF, Pennacchio CP, Hellsten U, Pennacchio LA, Gunter LE, Ranjan P, Vining K, Pomraning KR, Wilhelm LJ, Pellegrini M, Mockler TC, Freitag M, Geraldes A, El-Kassaby YA, Mansfield SW, Cronk QCB, Douglas CJ, Strauss SH, Rokhsar D, Tuskan GA (2012) Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa. New Phytol 196:713–725

    Article  CAS  PubMed  Google Scholar 

  • Stanton-Geddes J, Paape T, Epstein B, Briskine R, Yoder J, Mudge J, Bharti AK, Farmer AD, Zhou P, Denny R, May GD, Erlandson S, Yakub M, Sugawara M, Sadowsky MJ, Young ND, Tiffin P (2013) Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome sequence-based association genetics in Medicago truncatula. PLoS One 8:e65688

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wall JD, Pritchard JK (2003a) Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 4:587–597

    Article  CAS  PubMed  Google Scholar 

  • Wall JD, Pritchard JK (2003b) Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet 73:502–515

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank Ole Tange for developing GNU Parallel, which was essential for data analysis, and Ariane Boehm for assistance in determining the algorithmic complexity of M.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick A. Reeves.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reeves, P.A., Richards, C.M. Capturing haplotypes in germplasm core collections using bioinformatics. Genet Resour Crop Evol 64, 1821–1828 (2017). https://doi.org/10.1007/s10722-017-0549-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10722-017-0549-6

Keywords

Navigation