Background

Beta vulgaris L. (beet) is an economically important plant species consisting of several distinct cultivated lineages (B. vulgaris subsp. vulgaris) These lineages, or “crop types,” include sugar beet, table beet, fodder beet, and chard. The crop types have been adapted for specific end uses and thus exhibit pronounced phenotypic differences. Crop type lineages breed true, indicating a genetic basis for these phenotypes. Cultivated beets likely originated from wild progenitors of B. vulgaris subsp. maritima, also called “sea beet” [5]. It is widely accepted that beet populations were first consumed for leaves. The earliest evidence for lineages with expanded roots occurs in Egypt around 3500 BC. The root types and the origin of the enlarged root is thought to have occurred in the Near East (Iraq, Iran, and Turkey) and spread west (Europe) [50]. Interestingly, beet production for roots as an end use was first described along trade routes across Europe. Historically, Venice represented a major European market of the Silk Road and facilitated the distribution of eastern goods across Europe [24]. Table beet has been proposed to have been developed within Persian and Assyrian gardens [21]. Whether this specifically corresponds to the origin of the expanded root character or a restricted table beet phenotype remains unknown. In fact, early written accounts regarding the use of root vegetables often confused beet with turnip (Brassica rapa).

Hybridization between diverged beet lineages has long been recognized as a source of genetic variability available for the selection of new crop types and improving adaptation ([42] cited in [10, 49]). In 1747, Margraff was the first to recognize the potential for sucrose extraction from beet. Achard, a student of Margraff, was the first to describe specific fodder lineages that contained increased quantities of sucrose and the potential for an economically viable source of sucrose for commoditization [49]. In 1787, Abbe de Commerell suggested red mangle (fodder) resulted from a red table beet/chard hybrid and that the progenitors of sugar beet arose from hybridizations between fodder and chard lineages [17, 18]. Louise de Vilmorin (1816–1860), a French plant breeder, first detailed the concept of progeny selection in sugar beet, a method of evaluating the genetic merit of lineages based on progeny performance [20]. Vilmorin used differences in specific gravity as a measure to select beet lineages and increase sucrose content. This approach led to increases in sucrose concentration from ~ 4% in fodder beet to ~ 18% in current US hybrids (reviewed in [35]).

B. vulgaris is a diploid organism (2n = 18) with a predicted genome size of 758 Mb [4]. Chromosomes at metaphase exhibit similar morphology [39]. The first complete reference genome for B. vulgaris (e.g., RefBeet) provided a new perspective regarding the content of the genome (e.g., annotated gene models, repeated sequences, and pseudomolecules) [15]. This research confirmed whole genome duplications and generated a broader view of genome evolution in the Eudicots, Caryophyllales, and Beta. The EL10.1 reference genome [19] represents a contiguous chromosome scale assembly resulting from a combination of PacBio long-read sequencing, BioNano optical mapping and Hi-C linking libraries. Together, EL10.1 and RefBeet provide new opportunities for studying the content and organization of the beet genome. Resequencing of important beet accessions has the potential to characterize the landscape of variation and inform recent demographic history of beet, including the development of crop types and other important lineages.

Population genetic inference leveraging whole genome sequencing (WGS) data have proven powerful tools for understanding evolution from a population perspective [8, 29, 43]. Knowledge of the quantity and distribution of genetic variation within a species is critical for the conservation and preservation of genetic resources in order to harness the evolutionary potential required for the success of future beet cultivation. Recent research has revealed the complexity of relationships within B. vulgaris crop types [2]. Studies have shown sugar beet is genetically distinct and exhibits reduced diversity compared to B. vulgaris subsp. maritima. Geography and environment are major factors in the distribution of genetic variation within sugar beet accessions in the US [33]. Furthermore, spatial and environmental factors were evident in the complex distribution of genetic variation in wide taxonomic groups of Beta [1], which include the wild progenitors of cultivated beet.

Here we present a hierarchical approach to characterize the genetic diversity of cultivated B. vulgaris using pooled sequencing of accessions representing the crop type lineages. These accessions contain a wide range of phenotypic variation including leaf and root traits, distinct physiological/biochemical variation in sucrose accumulation, water content, and the accumulation and distribution of pigments (e.g., betaxanthin and betacyanin). These phenotypic traits, along with disease resistance traits, represent the major economic drivers of beet production. Developmental genetic programs involved in cell division, tissue patterning, and organogenesis likely underlie the differences in root and leaf quality traits observed between crop types. Improvement for these traits as well as local adaptation and disease resistance occurs at the level of the population. Pooled sequencing provides a means to characterize the diversity of important beet lineages and survey the nucleotide variation, which has utility in marker-based approaches across a diverse community of breeders and researchers interested in B. vulgaris. Pooled sequencing works in synergy with both the reproductive biology of the crop as well as the means by which phenotypic diversity is evaluated (e.g., population mean phenotype) and beets are improved through selection. The genetic control of important beet traits, currently unknown, will help prioritize existing variation and access novel sources of trait variation in order to address the most pressing problems related to crop productivity and sustainability.

Results

Twenty-five individuals from each of the 23 B. vulgaris accessions were chosen to represent the cultivated B. vulgaris crop types (Table 1 and Fig. 1). Leaf tissue was pooled, DNA extracted and sequenced using the Illumina 2500 in paired end format. On average, 61.84 ± 12.22 GB of sequence data was produced per accession, with an average depth of 81.5X. After processing for quality, reads were aligned to the EL10.1 reference genome. Approximately 20% of bases were discarded owing to trimming of low-quality base calls and adapter sequences. Biallelic SNP and lineage-specific variants were used to estimate the quantity and organization of genome-wide variation within these B. vulgaris populations and groups (e.g., species, crop types, and accessions). On average 90.74% of the filtered reads aligned to the EL10.1 reference genome. A total of 14,598,354 variants were detected across all accessions, and 12,411,164 (85.0%) of these were classified as a SNP, and of these 10,215,761 (82.3%) were biallelic. Thus, most SNP variants appeared to be biallelic, as only 2,718,205 (18.6%) of the SNP variants were characterized as multiallelic. After filtering for read depth (n ≥ 15), 8,461,457 biallelic SNPs remained for computational analysis. Insertions and deletions (indels) were called using GATK (370,260) (Table 2), which served to reduce false variants resulting from misalignments. This represented a large reduction from the 2,187,190 indels called using the bcftools pipeline.

Table 1 List of materials for sequencing
Fig. 1
figure 1

Phenotypes of B. vulgaris showing crop type characteristics are distinguishable by 9-weeks of age. Color bars refer to crop type in subsequent figures

Table 2 SNP and Indel variation in cultivated B. vulgaris. Gene diversity (2pq) indicates the diversity and expected genetic variation within populations

AMOVA was performed in order to quantify the distribution of variation within and among cultivated B. vulgaris crop types. The results showed no strong population subdivision with respect to crop type. The variation shared among crop types (99.37%), far exceeded the variation apportioned between crop type lineages (0.40%). The variation detected between accessions within a crop type was also low (0.23%) (Table 3). This result suggested a small proportion of the total variation is unique to any given accession. This was confirmed by the low quantity of lineage-specific variation (LSV) detected, evaluated in a hierarchical fashion. Lineages were defined as individual accessions, crop types, and species (Table 2). In total, 600,239 variants (4.0%) were unique and fixed within a single accession. The accumulation of variation on specific chromosomes for each accession was informative (Table 4). Individual accessions of sugar beet contained a large quantity of LSV on Chromosome 6 relative to other sugar beet chromosomes and indicated that either divergent selection or drift has occurred on this sugar beet chromosome. The variety, ‘Bulls Blood’ (BBTB), contained the greatest amount of LSV detected, 8893 indels and 79,236 SNP variants (Table 2). Table beet accessions contained the most LSV overall which suggested Table Beet is the most divergent of the crop type (Table 4).

Table 3 Analysis of molecular variance (AMOVA)
Table 4 Number of lineage-specific SNP and indel variants along chromosomes

Within the crop types, 10,661 variants were crop type specific and were not found within any other crop type. Of these, 8098 were characterized as SNPs and 1963 as indels. The number of SNP LSV detected within sugar beet, table beet, fodder beet, and chard crop types were as follows: 3317, 1379, 643, and 3359, respectively (Table 2). Indel LSV detected for the crop types were 342, 558, 205, and 858, respectively (Table 2). Diversity contained within the species, crop type, and individual accessions was estimated using expected heterozygosity (2pq) (Table 2 and Fig. 2). Expected heterozygosity (2pq) varied from 0.027 in our inbred reference EL10 sugar beet accession to 0.253 in the recurrent selection sugar beet breeding population GP9. Within the crop types, the mean expected heterozygosity for sugar beet was 0.207, table beet = 0.147, fodder beet = 0.221, and chard = 0.216 (Table 2). Interestingly, chard contained the most LSV of the crop types yet showed high diversity (2pq), suggesting unique variation supports the divergence of this lineage.

Fig. 2
figure 2

Gene diversity/expected heterozygosity (2pq) of B. vulgaris lineages. a Populations, b Crop types, and c Species. Colors are coded according to crop type and values are present in Table 2

The expected heterozygosity (2pq) for accessions such as EL10 and W357B was low. This was expected owing to inbreeding via the presence of self-fertility alleles in these two accessions. These accession EL10 was excluded from further analysis due to the fact that the sequence data was derived from a single individual. Interestingly, the variety ‘Bulls Blood’ lacked variation relative to other beet accessions, and it is likely that recent selection underlies this result (Chris Becker, personal communication). The variation in diversity estimates as measured by expected heterozygosity (2pq) suggested the level of diversity is highly dependent on the breeding system, selection for end use traits and Ne size.

The variation detected was used to cluster accessions in two ways: (1) a hierarchical clustering based on relationship coefficients estimated using the quantity of shared variation between accessions, and (2) a principal components analysis using allele frequency in each accession, estimated using an IBS (Identity by State) approach. The resulting dendrogram and heatmap showed that the table beet crop type was the only group to have strong evidence (e.g., high relationship coefficients and bootstrap values) supporting it as a unique group harboring significant variation (Table 5). Likewise, the green (LUC and FGSC) and red (RHU and Vulcan) chard accessions showed evidence for two distinct groups (Fig. 3). Sugar beet lineages with known pedigree relationships and high probability for shared variation (e.g., SR98/2 and EL51) also had strong evidence, which supports the delineation of population structure on the basis of shared variation. Additionally, the clade composed of SP7322, SR102, GP10, and GP9 resolved in a similar fashion.

Table 5 Pairwise relationship matrix. Relationship coefficients are indicated above the diagonal, the number of shared variants is indicated below the diagonal, and the number of variants is given on the diagonal
Fig. 3
figure 3

Lineage relationships inferred by hierarchical clustering of pairwise relationship coefficients. a Dendrogram reflects support for clusters. Branch lengths indicate relationship coefficients between lineages, high (blue) and low (red). b Heatmap shows relationship coefficient values for all comparisons. Colors at the bottom and left of heat map represent crop type, sugar beet (blue), fodder beet (orange), chard (green), table beet (red)

PCA used genome-wide allele frequency estimates for individual accessions. The first principal component (PC1) explained 75.6% of the variance in allele frequency and separated the table beet crop type from the other crop types. The second component (PC2) explained 15.25% of the variance (Fig. 4). Sugar and table beets appeared the most divergent and were able to be separated along both dimensions. Chard and fodder crop types were distinguishable but appeared less divergent. Allele frequency estimates analyzed on a chromosome-by-chromosome basis demonstrated that specific chromosomes cluster the accessions by crop type (Fig. 5). Chromosomes 3, 8, and 9 appear to be important for the divergence between sugar beet and other crop types. All chromosomes were able to separate table beet with the exception of Chromosomes 7 and 9.

Fig. 4
figure 4

PCA plot showing the separation of crop types using genome-wide allele frequency data

Fig. 5
figure 5

PCA plot showing the separation of crop types using allele frequency data on a chromosome by chromosome basis. Colors group crop types as in Fig. 4

Finally, using our population genomic data we tested a composite likelihood method to estimate historical effective population size (Ne) to infer demographic histories for crop type lineages. Table beet appears to have a distinct history in this respect as well as one or more demographic separations when compared with the other three lineages. Trends in historical effective population sizes (Ne) for fodder and sugar groups were quite similar to each other, and no early divergence was detected between them. The chard group appeared to share early demographic history with the fodder/sugar group but showed a different trend later, suggesting it diverged early with respect to the other crop types (Fig. 6). The demographic history of B. vulgaris crop type correlates well with historical evidence (e.g., records of antiquity, archeological evidence, and scientific literature) detailing the development of distinct crop type lineages (Table 6).

Fig. 6
figure 6

Inferred historical Ne of B. vulgaris crop types using the program SMC++. Colors group crop types. Red = table beet, blue = sugar beet, green = chard (leaf beet), yellow = fodder beet

Table 6 Historical time line highlighting evidence of beet utilization

Discussion

The accessions sampled here represent divergent lineages used in the cultivation of beet. All have notable breeding histories, which has served to capture and fix genetic variation resulting in predictable phenotypes characteristic of each lineage (e.g. accession or crop type). The organization and distribution of genetic variation within and between accessions reflects the historical selection and evolutionary pressures experienced as these crop types and varieties were developed. Pooled sequencing allowed us to make the cogent genomic comparisons that informs the history of beet development, from ancestral gene pools and domestication to the development of varieties and germplasm within modern breeding programs. Using population genomic data, we were able to support B. vulgaris as a species complex, uncover genomic variation associated with development of beet crop types, and gain fundamental insight into the natural history of beet.

Two biological groups could be identified with high confidence using these data, a table beet group and a group encompassing chard, fodder beet, and sugar beet. Previous research, which used genetic markers to cluster crop types, reported similar findings [1, 30]. The strong evidence for a unique table beet group hints at both genetic drift, resulting from reproductive isolation, as well as positive selection for end use (Figs. 3, 4, 6). In general, selection and drift act to change allele frequency within a population [23], but the effects are relative to the effective population size (Ne) of the populations under selection. Effective population size is an important consideration because it relates to the standing genetic diversity within populations (Crow and Denniston [11, 47]). The patterns of variation resulting from drift and selection are distinct. For example, table beet accessions had low diversity (2pq) relative to other crop types (Table 2), and the ability to separate table beet accessions on the basis of allele frequency is suggestive of selection (Figs. 4 and 5). Relationship coefficients, on the other hand, highlight the differences in the quantity of shared variation within and between crop types (Table 5 and Fig. 3), suggesting table beet may have been less connected to other crop types historically. Allele frequency showed signals of differentiation distributed across all chromosomes for table beet (Fig. 5), likely reflecting both selection and drift. The low quantity of shared variation between crop types did not support long term phylogeographic explanations for the differentiation observed. Long periods of geographic isolation can produce barriers to reproduction, further reinforcing isolation and divergence of populations [40]. This appears not to be the case in cultivated beet, as experimental hybrids between crop types show few barriers to hybridization and produce viable progeny, which does not suggest a large degree of chromosomal variation between the groups. The creation of segregating populations from crosses between sugar and table beet crop types support this observation [26, 34].

The lesser degree of separation between chard, fodder, and sugar crop types may be the result of increased connectivity (e.g., historical gene flow) between these lineages versus table beet. High gene flow exerts a homogenizing effect on the diversity contained within populations and increases the quantity of shared variation. This may explain a lack of clear delineation of these crop types using genome-wide markers. Fodder and sugar crop types could be separated using allele frequency (Fig. 4) but clusters based on shared variation were less clear (Fig. 3). This was not unexpected given the known history between these lineages. The development of fodder lineages that accumulate sucrose have occurred in recent history (~ 200 years), giving rise to the progenitor of sugar beet, the ‘White Silesian’ [17, 49]. Phenotypic divergence between species is attributed more to indel variation than to SNP variation owing to their greater consequences on gene expression and gene regulation [9]. This phenomenon may be visible in population divergence as well as speciation. The high quantity of shared variation between sugar and fodder crop types (Table 5) and the low quantity of indel LSV detected within sugar and fodder crop types (Table 2) likely reflects a shared demographic history relative to comparisons between other crop types (Fig. 6). Interestingly, chard contained the most LSV of the crop types yet showed high diversity (2pq), suggesting some unique variation supports the divergence of this lineage. The larger quantity of shared variation between the sugar beet, fodder beet, and chard crop types versus table beet (Table 5) suggests differences in the extent and timing of gene flow between lineages.

Chard is hypothesized as the first crop type developed from diverse ancestral B. vulgaris subsp. maritima populations [5, 49]. This is supported by the high level of diversity (2pq) (Table 2 and Fig. 2), a high quantity of LSV (Table 2), and interesting trends in the demographic history (Fig. 6). The clear delineation of two distinct chard groups (Fig. 3) suggests major differences in genome composition between the two groups and a unique demographic history for each chard lineage. The chards share similar leaf morphology but the roots of the red chard group were enlarged and had fewer ‘sprangles’ (e.g. adventitious roots branching from the tap root) relative to the green chard accessions but not to the extent as the root types (e.g. sugar, fodder, and table). This may reflect introgressions between the red chard and a root type, and potentially an unintended consequence of chard improvement for color traits.

The enlarged tap root character appears to have been first developed in table beet lineages [5], but the expanded root character is shared across crop type lineages. This suggests several hypotheses: (1) the root character in fodder beet reflects the introgression of this character from a table beet to a chard background and represents a single source for this character [50], (2) an ancestral population gave rise to the root character and diverged into fodder and table lineages, (3) the enlargement evolved several times and contributes to the diversity in shape and form. Historically, it appears admixture, hybridization, and introgression were fundamental to the development of beet lineages. Schukowsky [42] suggested that the broad adaptation of beet to novel growing environments may be due to variation accumulated in geographically diverse ancestral populations and shared via admixture and gene flow between lineages. Trait variation in wild relatives is becoming increasingly important for crop adaptation to a changing growing environment [44]. Distinguishing between sorting ancestral variation and introgression events remains a challenge in population genomic analysis but could yield important insight into beet crop type development, and other cultivated species as well.

The beet crop types have appeared to have diverged by selection. The variation in allele frequency of bi-allelic SNPs for beet accessions was able to distinguish the crop types (Fig. 4). This suggests that the allele frequency data contains signal related to historical selection (Fig. 5). Sugar and table beet appear to be the most diverged, which is consistent with large breeding efforts for each of these crop types. Allele frequency data analyzed on a per chromosome basis demonstrated that only specific chromosomes can differentiate on the basis crop type. Ostensibly the presence of variation located on specific chromosomes is under positive selection for end use, leading to an accumulation of lineage-specific differences including those linked to defining phenotypic characters. In fact, many quantitative trait loci studies support the fact that specific regions along chromosomes contain the variation that ultimately influences phenotype [14]. Interestingly, even small amounts of variation can have profound effects on phenotypic variation [13, 37]. Allele frequency estimates for specific chromosomes as well as the variation in lineage-specific variation for crop type on specific chromosomes suggests a small degree of total genome variation explains beet crop type differences. Given the support for crop type relationships based on allele frequency and degree of shared variation, it appears the divergence of beet crop types occurred in the presence of high gene flow. Population divergence in the presence of gene flow produces distinct patterns of variation with respect to selection [32]. Cryptic relationships within other species complexes have been explained by various models including the islands-of-differentiation model [6, 48].

Admixture and introgression events may have served to share genetic variation across cultivated beet accessions and crop type lineages, which in turn, created challenges for the clear delineation of subpopulations. This is confounded by the fact that, as lineages evolve, a lesser quantity of variation with greater agricultural importance contributes to our notion of economic and agronomic value. Resolving the degree to which historical admixture and introgression has contributed to the development of beet crop type will require more in-depth analysis of the variation at nucleotide level within local chromosome regions.

Conclusions

Beet crop types are important lineages which exhibit both genetic and phenotypic divergence. Sufficient support for treatment of these groups as significant biological units was present from de novo clustering of beet accessions. It would appear selection for end use qualities and genetic drift were major factor in the observed differentiation between lineages and explains the apportionment of genetic variation between crop types at distinct chromosome locations. Common ancestry and admixture and introgression likely maintained levels of genetic variation between crop types and reflects a complex demographic history between crop types. The majority of genetic variation detected in beet crop types was biallelic SNPs, but lineage specific variation may have had a greater role in crop diversification, with table beet showing the greatest degree of differentiation. Most variation is held within the species (as represented by the crop types here), and only a small amount of the total variation is partitioned within individual crop types. Understanding the history of beet crop type diversification, in terms of the evolution of genomes and traits within and between crop types, will help to identify and recover a genetic basis for crop type phenotypes. Directed molecular breeding approaches may be developed to incorporate novel traits from other crop types and wild populations.

Methods

Beta vulgaris accessions and sequencing

Twenty-three beet accessions were sequenced to 80X coverage relative to the predicted 758 Mb B. vulgaris genome using a pooled sequencing approach. The accessions are representative of the four recognized crop types and capture the range of phenotypic diversity found within cultivated beet (Table 1). Accessions were grown in the greenhouse and leaf material was harvested from 25 individuals per accession. Leaf material, one young expanding leaf of similar size from each individual within an accession, was combined, homogenized, and DNA was extracted using the Macherey-Nagel NucleoSpin Plant II Genomic DNA extraction kit (Bethlehem, PA). Libraries were prepared using the Illumina TruSeq DNA Nano Library Preparation Kit. Libraries were QC’d and quantified using a combination of Qubit dsDNA HS, Caliper LabChipGX HS DNA and Kapa Biosystems Illumina Library Quantification qPCR assays. Each set of 8 libraries were pooled in equimolar amounts. Each of these pools was loaded on four (4) lanes of an Illumina HiSeq 2500 High Output flow cell (v4). Sequencing was done using HiSeq SBS reagents (v4) in a 2x125bp paired end format. Base calling was performed by Illumina Real Time Analysis (RTA) v1.18.64 and output of RTA was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v1.8.4. The resulting reads were assessed for quality using FastQC [3], library bar-code adapters were removed, and reads were trimmed according to a quality threshold using TRIMMOMATIC [7] invoking the following options (ILLUMINACLIP:adapters.fa-:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36). These filtered reads were used for downstream analysis.

Data processing and variant detection

Variants for each accession were called by aligning the filtered reads to the EL10.1 reference genome assembly [19] using bowtie2 v2.2.3 (options -q --phred33-quals -k 2 -x) [25]. An insert size distribution was estimated for paired end read mappings (Additional file 1: Figure S1). The resulting alignment files were sorted and merged using SAMtools version 0.1.19 [28]. SNP variants were called for each accession using BCFtools [27], filtered for mapping quality (MAPQ > 20) and read depth (n > 15), and then combined using VCFtools [12]. The combined data was again filtered to obtain biallelic sites across all accessions. Indels were evaluated using the Genome Analysis Toolkit (GATK) haplotype caller [36] following best practices (https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows). Indel size distribution was also calculated (Additional file 2: Figure S2). The ‘mpileup’ subroutine in SAMtools was then used to quantify the alignment files and extract allele counts. Allele frequency was estimated within each accession for SNP loci identified as biallelic across all accessions. Population parameters were then estimated using allele frequencies within each accession such that (p + q = 1), where p was designated as the allele state of the EL10.1 reference genome and q, the alternate, detected in each sequenced accession. Expected heterozygosity (2pq), also termed gene diversity [38], was used to compare diversity contained within each accession.

AMOVA

Analysis of molecular variance (AMOVA) was used to assess the distribution of genetic variation within the species [16]. AMOVA was performed using the ade4 package in R [46] following the approach for pooled sequence data outlined in Gompert et al. [22].

Crop type relationships

Biallelic SNPs were used to calculate pairwise relationship coefficients between accessions using an identity by state (IBS) approach within the Kinship Inference for Association Genetic Studies (KING) package [31]. Neighbor joining trees were generated in order to extract bootstrap support along branches of our phylogram. In total 100 replications were used and analysis was carried out using the ape package (Analyses of Phylogenetics and Evolution) in R (Paradis and Schliep [41]).

Principle components analysis (PCA)

PCA was carried out in R using singular value decomposition function, svd() in R.

Population size history

Composite likelihood methods were used to estimate historical population sizes and infer demographic history from genome sequences of each accession using the program SMC++ version 1.12.1 [45] invoking the commands (smc++ estimate -o analysis/ 1.25e-8) to estimate historical population size and (smc++ split -o split/ pop1/model.final.json pop2/model.final.json) to estimate the joint demography between populations. A mutation rate of 1.25e-8 was assumed used based on the Arabidopsis mutation rate predicted to be between 10e-7 and 10e-8.

Lineage-specific variation

Lineage-specific variation (LSV), defined as homozygous private variation (e.g., apomorphy), was extracted from the merged VCF file containing variants for all accessions. Variants that were fixed within a particular accession or assemblage of accessions (lineage), and not detected within any other lineage, were considered LSV. Variant files representing LSV were produced for each lineage in a hierarchical fashion (e.g., species, crop type and accessions). LSV was then evaluated with respect to lineage as well as its distribution along chromosomes.