Abstract
Genomewide data sets of single nucleotide polymorphisms (SNPs) offer great potential to improve ex situ conservation. Two factors impede their use for producing core collections. First, due to the large number of SNPs, the assembly of collections that maximize diversity may be intractable using existing, serial software algorithms. Second, the effect of the natural partitioning of the genome into linked regions, or haplotype blocks, on the optimization of collections, and the capture of diversity, is unknown. To address the first problem, we report the development of a parallel computer program, M+, for identifying optimized core collections from arbitrarily large genotypic data sets on high performance computing systems. With respect to the second problem, we use three exemplar data sets to show that, as haplotype block length increases, the number of accessions necessary to capture a predetermined proportion of genomewide haplotypic variation also increases. This relationship is asymptotic such that the minimum haplotype block length suitable for assembling core collections can be empirically determined, and the number of accessions necessary to capture a given percentage of the haplotypic diversity present in the entire collection can be estimated, even when true haplotype structure is unknown. Additionally, we test whether simple geographic or environmental information can be used to produce core collections with elevated genomewide haplotypic diversity. We find this opportunity to be limited, and dependent on natural history and improvement status.
References
Chia J-M, Song C, Bradbury PJ, Costich D, de Leon N, Doebley J, Elshire RJ, Gaut B, Geller L, Glaubitz JC, Gore M, Guill KE, Holland J, Hufford MB, Lai J, Li M, Liu X, Lu Y, McCombie R, Nelson R, Poland J, Prasanna BM, Pyhäjärvi T, Rong T, Sekhon RS, Sun Q, Tenaillon MI, Tian F, Wang J, Xu X, Zhang Z, Kaeppler SM, Ross-Ibarra J, McMullen MD, Buckler ES, Zhang G, Xu Y, Ware D (2012) Maize HapMap2 identifies extant variation from a genome in flux. Nat Genet 44:803–807
de Wet JMJ, Harlan JR (1971) The origin and domestication of Sorghum bicolor. Econ Bot 25:128–135
Flint-Garcia SA, Thornsberry JM, Buckler ES (2003) Structure of linkage disequilibrium in plants. Annu Rev Plant Biol 54:357–374
Fuller DQ, Denham T, Arroyo-Kalin M, Lucas L, Stevens CJ, Qin L, Allaby RG, Purugganan MD (2014) Convergent evolution and parallelism in plant domestication revealed by an expanding archaeological record. Proc Natl Acad Sci USA 111:6147–6152
Geraldes A, Farzaneh N, Grassa CJ, McKown AD, Guy RD, Mansfield SD, Douglas CJ, Cronk QCB (2014) Landscape genomics of Populus trichocarpa: the role of hybridization, limited gene flow, and natural selection in shaping patterns of population structure. Evolution 68:3260–3280
Gouesnard B, Bataillon TM, Decoux G, Rozale C, Schoen DJ, David JL (2001) MSTRAT: an algorithm for building germ plasm core collections by maximizing allelic or phenotypic richness. J Hered 92:93–94
Gross BL, Volk GM, Richards CM, Reeves PA, Henk AD, Forsline PL, Szewc-McFadden A, Fazio G, Chao CT (2013) Diversity captured in the USDA-ARS national plant germplasm system apple core collection. J Am Soc Hortic Sci 138:375–381
Lasky JR, Des Marais DL, McKay JK, Richards JH, Juenger TE, Keitt TH (2012) Characterizing genomic variation of Arabidopsis thaliana: the roles of geography and climate. Mol Ecol 21:5512–5529
Lasky JR, Upadhyaya HD, Ramu P, Deshpande S, Hash CT, Bonnette J, Juenger TE, Hyma K, Acharya C, Mitchell SE, Buckler ES, Brenton Z, Kresovich S, Morris GP (2015) Genome-environment associations in sorghum landraces predict adaptive traits. Sci Adv 1:e1400218
Legendre P (1993) Spatial autocorrelation: trouble or new paradigm? Ecology 74:1659–1673
Maxted N, Dulloo E, Ford-Lloyd BV, Iriondo JM, Jarvis A (2008) Gap analysis: a tool for complementary genetic conservation assessment. Divers Distrib 14:1018–1030
Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE, Harriman J, Glaubitz JC, Buckler ES, Kresovich S (2012) Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci USA 110:453–458
Nordborg M, Hu TT, Ishino Y, Jhaveri J, Toomajian C, Zheng H, Bakker E, Calabrese P, Gladstone J, Goyal R, Jakobsson M, Kim S, Morozov Y, Padhukasahasram B, Plagnol V, Rosenberg NA, Shah C, Wall JD, Wang J, Zhao K, Kalbfleisch T, Schulz V, Kreitman M, Bergelson J (2005) The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol 3:e196
Platt A, Horton M, Huang YS, Li Y, Anastasio A, Mulyati NW, Ågren J, Bossdorf O, Byers D, Donohue K, Dunning M, Holub EB, Hudson A, Le Corre V, Loudet O, Roux F, Warthmann N, Weigel D, Rivero L, Scholl R, Nordborg M, Bergelson J, Borevitz JO (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6:e1000843
Rafalski A (2002) Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94–100
Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644
Schoen DJ, Brown AHD (1993) Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers. Proc Natl Acad Sci USA 90:10623–10627
Schwartz R, Halldórsson BV, Bafna V, Clark AG, Istrail S (2003) Robustness of inference of haplotype block structure. J Comput Biol 10:13–19
Slavov GT, DiFazio SP, Martin J, Schackwitz W, Muchero W, Rodgers-Melnick E, Lipphardt MF, Pennacchio CP, Hellsten U, Pennacchio LA, Gunter LE, Ranjan P, Vining K, Pomraning KR, Wilhelm LJ, Pellegrini M, Mockler TC, Freitag M, Geraldes A, El-Kassaby YA, Mansfield SW, Cronk QCB, Douglas CJ, Strauss SH, Rokhsar D, Tuskan GA (2012) Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa. New Phytol 196:713–725
Stanton-Geddes J, Paape T, Epstein B, Briskine R, Yoder J, Mudge J, Bharti AK, Farmer AD, Zhou P, Denny R, May GD, Erlandson S, Yakub M, Sugawara M, Sadowsky MJ, Young ND, Tiffin P (2013) Candidate genes and genetic architecture of symbiotic and agronomic traits revealed by whole-genome sequence-based association genetics in Medicago truncatula. PLoS One 8:e65688
Wall JD, Pritchard JK (2003a) Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 4:587–597
Wall JD, Pritchard JK (2003b) Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet 73:502–515
Acknowledgements
We thank Ole Tange for developing GNU Parallel, which was essential for data analysis, and Ariane Boehm for assistance in determining the algorithmic complexity of M.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Reeves, P.A., Richards, C.M. Capturing haplotypes in germplasm core collections using bioinformatics. Genet Resour Crop Evol 64, 1821–1828 (2017). https://doi.org/10.1007/s10722-017-0549-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10722-017-0549-6