Gene flow signature in the S-allele region of cultivated buckwheat
Buckwheat (Fagopyrum esculentum Moench.) is an annual crop that originated in southern China. The nutritious seeds are used in cooking much like cereal grains. Buckwheat is an outcrossing species with heteromorphic self-incompatibility due to its dimorphic (i.e., short- and long-styled) flowers and intra-morph infertility. The floral morphology and intra-morph incompatibility are both determined by a single S locus. Plants with short-styled flowers are heterozygous (S/s) and plants with long-styled flowers are homozygous recessive (s/s) at this locus, and the S/S genotype is not found. Recently, we built a draft genome assembly of buckwheat and identified the 5.4-Mb-long S-allele region harbored by short-styled plants.
In this study, the first report on the genome-wide diversity of buckwheat, we used a genotyping-by-sequencing (GBS) dataset to evaluate the genome-wide nucleotide diversity within cultivated buckwheat landraces worldwide. We also investigated the utility of the S-allele region for phylogenetic analysis of buckwheat.
Buckwheat showed high nucleotide diversity (0.0065), comparable to that of other outcrossing plants, based on a genome-wide simple nucleotide polymorphism (SNP) analysis. Phylogenetic analyses based on genome-wide SNPs showed that cultivated buckwheat comprises two groups, Asian and European, and revealed lower nucleotide diversity in the European group (0.0055) and low differentiation between the Asian and European groups.
The nucleotide diversity (0.0039) estimated from SNPs in the S-allele region is lower than that in genome-wide SNPs. Phylogenetic analysis based on this region detected three diverged groups, S-1, S-2, and S-3.
The SNPs detected using the GBS dataset were effective for elucidating the evolutionary history of buckwheat, and led to the following conclusions: (1) the low nucleotide diversity of the entire genome in the European group and low differentiation between the Asian and European groups suggested genetic bottlenecks associated with dispersion from Asia to Europe, and/or recent intensified cultivation and selection in Europe; and (2) the high diversification in the S-allele region was indicative of gene flows from wild to cultivated buckwheat, suggesting that cultivated buckwheat may have multiple origins.
KeywordsBuckwheat Crop evolution GBS Heteromorphic self-incompatibility
Amplified fragment-length polymorphisms
Granule-bound starch synthases
- S locus
S-locus early flowering 3
Simple nucleotide polymorphisms
Short-style-specific gene 2
simple sequence repeats (SSRs)
Buckwheat (Fagopyrum esculentum Moench.; 2n = 2x = 16), a member of the Polygonaceae family, is an annual crop that originated in southern China [1, 2]. The seeds (strictly achenes) are used as cereal grains in the same way as rice (Oryza sativa) and wheat (Triticum aestivum); because buckwheat does not belong to the Poaceae family, it is often referred to as a pseudo-cereal. Buckwheat has excellent cultivation properties, with a short growing period and tolerance of cool climate and high elevation. Buckwheat is therefore widely cultivated in temperate zones throughout Eurasia and is used in many traditional foods, such as soba (Japanese noodles), memil guksu (Korean noodles), pizzoccheri (Italian pasta), and galettes (French pancakes). Buckwheat seeds are dense in starch and high-quality protein with a well-balanced amino acid composition  and are an important source of dietary fiber, trace elements, and phenolic compounds [4, 5]. Because of its high nutrient content and lack of gluten, buckwheat is now widely cultivated in regions beyond Eurasia, including in the USA, Canada, Australia, and New Zealand.
To identify agronomically useful genes, buckwheat linkage maps have been constructed using various molecular markers, including isozyme variations , simple sequence repeats (SSRs) , and amplified fragment-length polymorphisms (AFLPs) . Recently, a microarray marker system and a genome-wide linkage map for common buckwheat were developed . They later confirmed that genomic selection using genome-wide microarray markers was an effective approach for improving buckwheat yield . In addition, our research group constructed the Buckwheat Genome DataBase (BGDB) . Using BGDB, various agronomically useful genes, such as those controlling flavonoid biosynthesis and encoding 2S albumin-type allergens and granule-bound starch synthases (GBSSs), have been identified [11, 12, 13, 14]. Thus, the genetic tools for buckwheat breeding are highly developed.
Evaluating genetic diversity is a crucial step for exploring agronomically useful genes in crop species. Buckwheat is an outcrossing species with heteromorphic self-incompatibility (SI) due to its dimorphic flower types (i.e., short- and long-styled flowers), each incompatible with flowers of the same morph, but compatible across morphs [15, 16]. The floral morphology and intra-morph incompatibility are both determined by a single genetic locus, S. Plants with short-styled flowers are heterozygous (S/s) and those with long-styled flowers are homozygous recessive (s/s) at this locus [15, 17]. Thus, like other outcrossing crops such as maize (Zea mays), buckwheat is expected to maintain substantial intraspecies diversity. Indeed, isozyme analysis indicates that buckwheat has great genetic diversity, comparable to that of outcrossing wild plant species ; the heterozygosity of cultivated populations is higher than that seen in wild populations of the ancestral species, F. esculentum ssp. ancestrale, which is found in southern China. Subsequent AFLP and SSR analyses suggested that F. esculentum ssp. ancestrale is composed of two distantly related phylogenetic groups, the Tibetan and the Yunnan-Sichuan groups, and that cultivated buckwheat landraces belong to the Tibetan group [1, 19]. However, these analyses were based only on similarity of DNA banding patterns, and the average number of nucleotide differences per nucleotide site (i.e., the nucleotide diversity)  was not estimated. To date, no information on genome-wide nucleotide diversity within F. esculentum is available. Thus, there is great interest in investigating the nucleotide diversity of F. esculentum using newer and superior genomic tools.
Recently, we built a draft genome assembly of buckwheat and then applied genotyping-by-sequencing technology (GBS)  to a group of buckwheat landraces from around the world . The GBS method, which uses sequences of amplified genomic DNA fragmented by restriction enzymes, has become increasingly popular for detecting large numbers of SNPs . We used the draft genome assembly as reference for GBS markers, and successfully identified the S-allele region, which consisted of 332 scaffolds encompassing 5.4 Mbp . The region contains sites at which GBS reads were mapped in short-styled plants but not in long-styled plants, and harbors two S-allele-specific genes, S-ELF3 and SSG2, that exist only in the genomes of short-styled plants. The genotypes of short-styled and long-styled plants are thus hemizygous and null homozygous, respectively . This is similar to the situation for human sex chromosome genes; the hemizygous state of Y-chromosome genes has proved a useful feature for clarifying human phylogenetic structure . Thus, we predicted that comparisons of DNA sequences in the S-allele region would be a powerful tool for elucidating phylogenetic relationships among buckwheat landraces.
In this study, we evaluated the genome-wide nucleotide diversity within worldwide landraces of cultivated buckwheat using the published GBS data. This is the first report of genome-wide nucleotide diversity within F. esculentum. We discuss the utility of the S-allele region for phylogenetic analysis of the buckwheat species, and demonstrate possible gene flows from wild to cultivated buckwheat deduced from the diversified S-allele region.
Overall SNP detection and nucleotide diversity
Comparisons of nucleotide diversity
Scaffolds used as referenceb
Number of samples
Number of sites retained after mapping
Number of SNPs
Mean genetic diversity (pi)
Population structure and phylogenetic analysis
To clarify the population structure of buckwheat, we subjected 255,517 SNPs segregated in 46 GBS to ADMIXTURE analysis. Cross-validation error was lowest at K = 1 and gradually increased as K increased (Additional file 1: Figure S1). At K = 2, the cross-validation error was also low, indicating that the optimal number of ancestral populations was K = 1 or K = 2. Although the lowest cross-validation error was actually observed at K = 1, the NJ tree and PCA implied that there were two groups (one including only Asian landraces and the other only European landraces), indicating that K = 2 is suitable for grouping the 46 common buckwheat landraces. We refer to the two groups hereafter as the Asian and European groups.
To compare nucleotide diversity between the Asian and European groups, we identified SNPs and calculated the genome-wide mean nucleotide diversity of each group (Table 1). The SNP density was much lower in the European group (0.019) than in the Asian group (0.034). Nucleotide diversity was also lower in the European (0.0055) than in the Asian (0.065) group. However, even though the NJ tree and PCA plot indicated that there were two groups, the Fst value between Asian and European cultivars was not very high (Fst = 0.068).
Diversity of S-linked scaffolds
We previously identified the S-allele region, consisting of 332 scaffolds encompassing 5,393,196 bp, to which short-style-specific sites were mapped . To analyze the nucleotide diversity of this region, we identified SNPs on these 332 scaffolds in 23 short-styled plants. From the 60,108 sites (0.11% of the total length of the 332 scaffolds), we detected 1123 SNPs. The SNP density (0.019) and genetic diversity (0.0039) in the S-allele scaffolds were lower than those in all genomic scaffolds (Table 1).
Pairwise comparisons of the average genetic distances and Fst values between pairs of three phylogenetic groups
Genome-wide and S-allele-region-specific nucleotide diversity in common buckwheat
In this study, we obtained 255,517 genome-wide SNPs using the GBS method, which is capable of producing larger numbers of genome-wide SNPs than had been obtained in previous studies of this species. Furthermore, the published genome sequence  enabled us to estimate the nucleotide diversity, which revealed that the nucleotide diversity of common buckwheat (0.0065) was comparable to that of other outcrossing plants, such as maize (Zea mays, 0.0064)  and sunflower (Helianthus annuus; 0.0056) , and higher than that of selfing crops such as rice (Oryza sativa, 0.0024)  and soybean (Glycine max; 0.0019) . This higher nucleotide diversity is likely due to buckwheat’s outcrossing mating system and/or to gene flow from wild buckwheat, F. esculentum ssp. ancestrale.
The genetic basis of the heteromorphic SI system is similar to that of sex chromosomes in the plant and animal kingdoms. There have been numerous reports on the nucleotide diversities of sex chromosomes [28, 29, 30, 31]. In humans, for example, the ratios of the nucleotide diversities of the X and Y chromosomes to those of the autosomal chromosomes are 0.62 and 0.20, respectively . The lower nucleotide diversities observed in sex chromosomes are an expected result of the lower effective population size: the effective population sizes of the human X and Y chromosomes correspond to 3/4 and 1/4 that of autosomal chromosomes, respectively. In this study, we confirmed that the nucleotide diversity of the S-allele region is similarly lower than that of the genome as a whole: the ratio of the nucleotide diversity of the S-allele region to that of the whole genome is 0.62. That this number is much higher than the expected value (0.25) is probably due to the low density of functional genes in the S-allele region  and/or to recent multiple gene flows, as discussed in the following section. Much higher genetic diversity of S-allelic region than expected also indicates that S-allelic region is not under purifying selection.
Gene flow inferred from S-allele divergence
The phylogenetic relationships deduced from the 332 S-allele scaffolds (Fig. 3) were not congruent with those based on genome-wide SNPs (Fig. 1): three widely diverged phylogenetic groups, S-1, S-2, and S-3, were detected in the data shown in Fig. 3 (from the S-allele region) but not in Fig. 1 (from the whole genome). It is noteworthy that the three phylogenetic groups of S-allele regions are well diverged. In particular, S-3 is strongly divergent from the other two groups: the nucleotide diversity between S-3 and the other two groups is around 0.006 (Table 2). Using a rough molecular clock rate of 0.01 synonymous nucleotide substitutions per million years (e.g., 0.011 for Gossypium species and 0.016 for Arachis species) [33, 34], this would indicate that group S-3 diverged from the other groups 0.3 million years ago. The earliest plausibly identified buckwheat archeological pollen specimen, found in northern China, has been dated to 5000 to 6000 BP . Thus, it is unlikely that mutations detected in S-alleles derive from the origin of buckwheat cultivation. Considering the possibility of cross-compatibility resulting in fertile hybrids  and the overlap in habitats  between cultivated and wild buckwheat, we concluded that the diverged S-allele sequences were introgressed from wild buckwheat, F. esculentum ssp. ancestrale, and that the introgressions between wild and cultivated buckwheat are also attributable to high genome-wide diversity.
S-allele region as a phylogenetic tool for elucidating the origin and diffusion of cultivated buckwheat
ADMIXTURE and phylogenetic analyses and PCA classified 46 buckwheat cultivars into two groups, the Asian and European groups, with a low Fst value (Fst = 0.068). The European group exhibited lower nucleotide diversity than the Asian group (Table 1). From archeological studies of the pollen and macrofossil (i.e., charred seeds) records, it has been suggested that buckwheat was introduced into Europe during the period 4000–2800 BP, though it did not become a popular crop until the Late Medieval period . Both the lower nucleotide diversity in the European group and the weak differentiation between the Asian and European groups are likely due to genetic bottlenecks associated with dispersion from Asia to Europe, and/or the recent intensified cultivation and selection. The low genetic variation and loss of alleles detected by isozyme analysis also support this hypothesis .
Y chromosomes, which retain sequential records of the accumulation of nucleotide diversity, have been used to detect ancestral haplotypes and to trace human migrations . Although genome-wide SNPs have illuminated the population structure and population differentiation of cultivated buckwheat, the SNPs in S-linked scaffolds may shed light on different aspects of the diffusion history of the species. In this study, we detected three major S-allele groups, S-1, S-2, and S-3 (Fig. 3). The phylogenetic relationship implied by S-linked scaffolds suggested the introgressions from wild relatives as discussed above. This is due to the suppression of recombination in S-allelic region like sex chromosomes. Thus, phylogenetic analysis using S-allelic region is expected to offer better understanding of the origin and global diffusion of cultivated buckwheat.
In this study, we did not find type S-1 S-allele sequences in the populations from Nepal and Bhutan. Based on our identification of the three diverged S-allele groups, the possibility that cultivated buckwheat had multiple origins should also be considered. Considering that remains of buckwheat seeds appeared in west-central Nepal from 3000 BP , we should consider the possibility that buckwheat was independently domesticated around the Himalayan region, including Nepal, Bhutan, and southwestern China. To elucidate the origin and global diffusion history of cultivated buckwheat, we would need to expand our sample set to include many more samples of cultivated buckwheat from around the world, as well as samples of wild buckwheat, F. esculentum ssp. ancestrale, from China.
Based on genome-wide SNPs obtained using GBS technology, we successfully estimated the nucleotide diversity (0.0065) of buckwheat, which is comparable to that of other outcrossing plants, such as maize and sunflower. Phylogenetic analyses based on genome-wide SNPs also showed that cultivated buckwheat is composed of two groups, Asian and European. The low nucleotide diversity of the European group and the low differentiation between the Asian and European groups are consistent with genetic bottlenecks associated with dispersion from Asia to Europe, and/or the recent intensified cultivation and selection of buckwheat in Europe. These results based on genome-wide SNPs are congruent with those of previous studies based on isozyme variation. The nucleotide diversity (0.0039) estimated using SNPs in the S-allele region was lower than that estimated using genome-wide SNPs, reflecting the smaller population size of the S-allele region as compared to the genome as a whole. The data also indicated the likelihood of gene flow from wild to cultivated buckwheat and the possibility of multiple origins for cultivated buckwheat. In conclusion, phylogenetic analysis using the S-allele region can offer a better understanding of the origin of cultivation and the global diffusion history of cultivated buckwheat.
Genotype-by-sequencing data and SNP detection
We used published GBS reads obtained using EcoRI and MseI restriction enzymes (DRA accession number DRA004489, ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/DRA004/DRA004489) . The GBS dataset represented 23 short-styled and 23 long-styled buckwheat landraces originating from a wide range of locations within Eurasia (Additional file 3: Table S1). Low-quality reads and adaptors were trimmed using Trimmomatic-0.36  with the options SLIDINGWINDOW:5:25 and MINLEN:40. The adaptor sequences used were CACGACGCTCTTCCGATCT and ACCGCTCTTCCGATCTGTAA. The trimmed paired-end (PE) reads were mapped against the buckwheat reference sequence (FES_r1.0) using BWA 0.7.15 (Li and Durbin 2009)  with the -L 10 and -B 10 options. Single-end reads were filtered from BAM files using samtools 1.3.1. . Variants were called by the samtools mpileup function and bcftools implemented in samtools 1.3.1. Sites with depths of less than 4 and more than 40 were converted to missing data using VCFtools 0.1.13 . Then, sites with proportions of missing data greater than 0.2 were filtered using VCFtools 0.1.13.
Construction of phylogenetic tree and structure analysis
ADMIXTURE v1.22  was used to investigate the population structure of the 46 common buckwheat landraces. For each value of K, ten ADMIXTURE analysis runs were performed with different random seeds. The best run was selected according to the highest value of log likelihood. A neighbor-joining (NJ) tree  was constructed using SEQBOOT with 100 replicates, followed by the DNAdist (with the Kimura two-parameter method), Neighbor, and Consense programs from the PHYLIP package 3.6 . The NJ tree was rooted using midpoint rooting and visualized with FigTree 1.4.2 . Principal-component analysis (PCA) based on covariance was performed using Tassel 5.2.37 .
The nucleotide diversity and F-statistics
Sites with a proportion of missing data greater than 0.2 were filtered using VCFtools 0.1.13. Then, the nucleotide diversity within species and each classification/group was calculated using VCFtools 0.1.13. Weir and Cockerham’s weighted F-statistics (Fst)  was calculated using VCFtools 0.1.13.
We thank T. Ota and H. Hunt for valuable suggestions, and K.L. Farquharson for language-editing support of the manuscript. This study was supported by JSPS KAKENHI (Grant Numbers 18H02177 and 18KK0172).
This study was supported by JSPS KAKENHI (Grant Numbers 18H02177 and 18KK0172).
The funding body had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
Availability of data and materials
The Illumina reads used in the GBS analyses are available in the DDBJ Sequence Read Archive (DRA) (The DRA accession number is DRA004489, ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/DRA004/DRA004489).
NM analyzed and interpreted the GBS data, and was major contributor in writing the manuscript. YY conceived of the study, participated in its design, performed analyses, and wrote the manuscript. Both authors read and approved the final manuscript.
Laboratory of Crop Evolution, Division of Applied Bioscience, Graduate School of Agriculture, Kitashirakawa oiwake-cho, Sakyou-ku, Kyoto 606–8501, JAPAN.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.Ohnishi O. On the origin of cultivated common buckwheat based on allozyme analyses of cultivated and wild populations of common buckwheat. Fagopyrum. 2009;26:3–9.Google Scholar
- 11.Yasui Y, Hirakawa H, Ueno M, et al. Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes. DNA res. 2016. 2016;23:215–24.Google Scholar
- 13.Katsu K, Suzuki R, Tsuchiya W, et al. A new buckwheat dihydroflavonol 4-reductase (DFR), with a unique substrate binding structure, has altered substrate specificity. BMC Plant biol. 2017. 2017;17:239.Google Scholar
- 14.Matsui K, Oshima Y, Mitsuda N, Sakamoto S, Nishiba Y, Walker AR, Ohme-Takagi M, Robinson SP, Yasui Y, Mori M, Takami H. Buckwheat R2R3 MYB transcription factor FeMYBF1 regulates flavonol biosynthesis. Plant Sci. 2018;21(9):276.Google Scholar
- 16.Ueno M, Yasui Y, Aii J, Matsui K, Sato S, Ota T. Genetic analyses of the heteromorphic self-incompatibility (S) locus in buckwheat. In: Zhou M, editor. Molecular breeding and nutritional aspects of buckwheat: Academic Press; 2016. p. 411–22.Google Scholar
- 17.Garber R, Quisenberry KS. Self-fertilization in buckwheat. J Agric Res. 1927;34:185–90.Google Scholar
- 20.Nei M. Molecular evolutionary genetics. New York: Columbia Univ. Press; 1987.Google Scholar
- 35.Hunt HV, Shang X, Jones MK. Buckwheat: a crop from outside the major Chinese domestication centres? A review of the archaeobotanical, palynological and genetic evidence. Veg Hist Archaeobot. 2018;27:493–506.Google Scholar
- 36.Ohnishi O. Non-shattering habit gene (sht), chlorophyll-deficient and other detrimental genes concealed in natural populations of the wild ancestor of common buckwheat. Fagopyrum. 1999;16:23–8.Google Scholar
- 47.PHYLIP package 3.6. http://www.evolution.genetics.washington.edu/phylip.htlm
- 48.FigTree 1.4.2. http://tree.bio.ed.ac.uk/software/figtree/
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.