Introduction

Soybean (Glycine max [L.] Merr.) is playing a key role in food security through supplying high-quality vegetable protein and oil. Over decades, world soybean production has steadily increased, and during the previous five years soybean was grown on 122–127 mio ha of arable land with a total annual harvest of 335–360 mio metric tons (FAOSTAT 2022). At present, North and South America are the main soybean producers and exporters, whereas China and Europe are the main soybean importing regions (Hartman et al. 2011). While China is the homeland and diversity center of soybean, world-wide dissemination, specific adaptation over many decades and continuous plant breeding activities have contributed to the evolution of regional populations in different geographic regions of soybean growing (Liu et al. 2020a).

Genetic diversity is the major driver for positive response to selection in breeding programs. During soybean domestication, early stages of landrace development, world-wide dispersal of plant introductions and modern cultivar development, however, genetic diversity has continuously decreased as a result of selection and the emergence of important domestication-related traits (Hyten et al. 2006; Sedivy et al. 2017). While the reduction of diversity has frequently been expressed as a loss of rare alleles, a significant gene loss (i.e. presence/absence variation and reduction of genome size) in elite soybean cultivars as compared to old cultivars, landraces or wild accessions has recently been reported from analysis of a soybean pan-genome assembled from over 1100 accessions (Bayer et al. 2022). This suggests that modern plant breeding has even selected for the absence of dispensable genes. As a consequence, monitoring and comparing of genetic diversity appears to be increasingly important to ensure future soybean breeding progress in yield and adaptation to specific environmental conditions.

On the basis of soybean genetic resources and germplasm accessions, a high level of diversity has been found in Chinese soybeans which were structured into seven primary ecotypes representing the different geographic soybean growing regions of China (Wang et al. 2006). In contrast, North American soybeans have a narrow genetic base tracing back to only a small number of ancestral plant introductions and their progeny (Gizlice et al. 1994), whereas a strong structuring of diversity according to growing regions has been found here as well (Gizlice et al. 1996). In Europe, early maturity soybean accessions from the north of Europe are related to chilling tolerant plant introductions similar to Canadian genotypes (Yamaguchi et al. 2018), whereas south European soybeans frequently have a northern United States (US) genetic background (Hrustic and Miladinovic 2011; Tomicic et al. 2015). Genetic diversity in French (Tavaud-Pirra et al. 2009), Polish (Czembor et al. 2021), Central European (Hahn and Würschum 2014) and Central-South European (Žulj Mihaljević et al. 2020) soybean germplasm and breeding materials has been reported in a number of individual studies utilizing different genomic tools and marker systems, but a comprehensive European analysis is still missing.

In contrast to the diversity present in various genetic resource collections, genetic diversity on the level of elite cultivars is of special interest in plant breeding, as it represents the latest breeding developments and the actual agrobiodiversity on farmers’ fields. In a comparison between present-day Chinese and US elite soybean cultivars, genetic diversity was larger in Chinese than in the US cultivars investigated; the genetic basis between the two populations was clearly different due to selection for traits related to environmental adaptation and yield components, and signals of selection were described which might be associated with important physiological properties (Liu et al. 2017). Likewise, a comparative analysis of genetic diversity between Chinese and European soybean cultivars would be of great interest for studying the effects of separated long-term selection under divergent environmental and agronomic conditions. And in practical soybean breeding, the potential of reciprocally enriching diversity in Chinese and European breeding programs could be utilized for better meeting future challenges of soybean production. Thus, for the present study two representative sets of early maturity Chinese and European elite soybeans were created, and a total number of 156 cultivars were genotyped with a 200 K SNP array and in addition with 71 simple sequence repeat (SSR) markers. The research objectives were (1) to compare the genetic diversity patterns present within and between Chinese and European elite soybean cultivars, (2) to study diversity on the level of maturity groups, and (3) to investigate the genetic structure present within cultivars from the two regions.

Materials and methods

Plant materials

A total set of 156 early maturity elite soybean cultivars of either Chinese (CN) or European (EU) origin was assembled for the present investigation (Suppl. Table 1 for descriptive information, Suppl. Table 2 for overview). European soybeans (77 genotypes from 10 countries) were representative of maturity groups 0000–II, and Chinese soybeans (79 genotypes) were classified into maturity groups 000–III (i.e. Northeastern Spring (NEsp) or Northern Spring (Nsp) Chinese soybean ecotype classification). For the purpose of handling and comparing similar numbers of genotypes in sub-sets, all soybean cultivars were assigned to one of four experimental groups depending on their actual maturity group classification (Suppl. Table 2).

DNA extraction

In genomic DNA extraction for SSR analysis, six seeds per genotype were germinated on filter paper. After three days, root tips (1 cm length) of 3–6 seedlings per genotype were pooled in 1.5 ml tubes and dried at room temperature in plastic bags with silica gel for two days. For DNA extraction, dried root tips were finely ground with 2 mm glass beads in a ball mill (Retsch GmbH, Haan, Germany). DNA was extracted using Wizard® DNA extraction kit from Promega (Promega Corp., Madison, WI, USA) according to protocol and stored at −20 °C until further processing. To extract genomic DNA for SNP analysis, 10 fresh leaves per genotype were frozen in liquid N2 and ground using a mortar. Subsequently, DNA was extracted according to the protocol described by Kisha et al. (1997).

SSR genotyping

A total of 67 unlinked microsatellite loci across all 20 soybean linkage groups were chosen for analysis. In addition, data from the four E gene loci E1, E2, E3 and E4 were added to achieve a wider genome coverage. Thus, 2 to 4 SSR loci per linkage group were available for analysis. Primer sequence information for each individual locus including E genes is available either from Žulj Mihaljević et al. (2020), Kurasch et al. (2017) or SoyBase (2022). Further information about all loci utilized is available in Suppl. Table 3 and in the SoyBase database (SoyBase 2022). For PCR reactions, the total volume of the PCR mixture was 10 µl containing 3.3 µl DNA (10 ng/µl), 1 × GoTaq® Green Master Mix (Promega), 0.25 pmol forward primer with an M13 tail added to its 5′ end (5′-CCCAGTCACGACGTTG-3′), 2.5 pmol reverse primer and 2.25 pmol fluorescent labelled M13 tail (FAM, Cy5) synthesized by MWG (Ebensburg, Germany). A 2-step PCR was performed as follows: initial denaturation at 95 °C for 2 min, followed by seven cycles of 45 s at 94 °C, 45 s at 68 °C (with each cycle the annealing temperature decreasing by 2 °C), and of 60 s at 72 °C. Products were subsequently amplified in the second step for 30 cycles at 94 °C for 45 s, 50 °C for 45 s, and 72 °C for 60 s, with a final extension at 72 °C for 5 min.

The PCR amplification products were separated using 12% polyacrylamide gels, 1 × TBE buffer in a C.B.S. electrophoresis chamber (C.B.S. Scientific Inc., Del Mar, CA, USA). Electrophoresis conditions were set at constant 400 V and 10 °C for 2 h. Gels were recorded using a Typhoon (GE Healthcare, Uppsala, Sweden) scanner in fluorescent mode. To evaluate the SSR results, alleles were counted manually as "1" for present and "0" for absent for each individual allele. In total 67 SSR markers with 379 alleles (2–11 per locus) were evaluated and a 0/1 table was created. Data from E-gene analysis were also included using all 14 possible alleles of the four E-genes utilized.

SNP genotyping and filtering

A 200 K SNP array was used for further analyzing the 156 soybean cultivars. Genotyping was carried out by the BeadChip with 159,072 SNP markers selected from the No. 1 Zhongdouxin Soybean Breeding Array ZDX1 (Sun et al. 2022) by COMPASS BIOTECHNOLOGY (Beijing Compass Biotechnology Co, Ltd., Beijing, China) through the Illumina iScan platform (Illumina, Inc., San Diego, CA, USA) following a protocol described in Infinium HD Assay Super Manual. The SNP alleles were called using the Genome-Studio genotyping software (V2011.1, Illumina, Inc.). The SNP data set was filtered by PLINK 1.9; variants with missing rates greater than 0.01, samples with missing rates greater than 0.05, and MAF smaller than 0.05 were removed from the data set. Soybean cultivars 100 (Heinong 61), 129 (Heinong 63) and 94 (Mengdou 30) were excluded from further analysis because of poor data quality, and finally 61,316 SNP markers and 153 genotypes passed the quality control for SNP analysis.

Statistical data analysis

Genetic diversity parameters and informativeness of the analyzed SSR loci as well as SSR distances were calculated by Powermaker 3.25 (Liu and Muse 2005) and GenAlEx 6.51 (Peakall and Smouse 2012). Estimation of population genetic parameters for both SSR and SNP marker data as well as AMOVA were carried out by the R package poppr (Kamvar et al. 2015) using R software vers. 4.1.2 (R Core Team 2022). The neighbour joining tree distance matrices were established by APE 5.0 package based on genetic distance and visualized by iTOL 6.4.3 (Letunic and Bork 2021). For SSR marker data, principal component analysis (PCA) was carried out by poppr package, while for the SNP data PCA was carried out using PLINK 1.9 (Chang et al. 2015). Ten components were calculated for each of the two marker types, and the two most significant components were utilized for graphical visualization of genetic diversity using GGPLOT2 (Wickham 2009). The population structure was analyzed by the software tools STRUCTURE 2.3.1 (Pritchard et al. 2000) for SSR and ADMIXTURE (Alexander et al. 2009) for SNP data with a proposed grouping (K) from 1 to 10. Bar graphs for revealing population structure were drawn by POPHELPER (Francis 2017) package. The linkage disequilibrium (LD) analysis was performed in POPLDDECAY (Zhang et al. 2019) software, LD values (r2) were calculated for all pairwise SNP combinations located at a maximum of 1000 Kb distance.

Results

Genetic diversity between Chinese and European cultivars

The elite soybean cultivars were clearly divided into two major groups of genotypes corresponding to the two regions of origin, i.e. China and Europe (Fig. 1a and b), and both marker systems revealed comparable major groupings. The clear regional separation between Chinese and European cultivars was confirmed as well by the scattering pattern of genotypes in principal component analysis (PCA), as indicated in Fig. 2a and b for SSR-markers and the SNP-array, respectively. As disclosed by the two dendrograms, clustering within the regions is largely based on maturity classification (text background color in Fig. 1a and b) with the early maturity experimental group (MG 0000, 000, 00) clearly separated from later groups, particularly in the European cultivars and most clearly visible in SNP-analysis (Fig. 1b). In addition to maturity, European cultivars tended to cluster in a country-wise manner: Both in SSR-and SNP-based dendrograms, Italian, Austrian, Serbian or Hungarian cultivars tended to fall into separate clusters thus representing particular breeding programs and genetic similarity within programs. In contrast, Chinese cultivars were partly clustered according to the experimental grouping, but the clustering based on institutions (i.e. individual breeding programs) was less reproducible than for the European region when comparing both marker systems. While both of the neighbor-joining tree dendrograms seemingly indicate European cultivars to be a sub-cluster within a Chinese cluster, this view is not supported by the PCA. Moreover, the Chinese cluster to which the European cultivars seem to be agglomerated to consists of different genotypes in the SSR and SNP based dendrograms indicating that the neighbor tree does not reveal particular Chinese ancestors for the European cultivars.

Fig. 1
figure 1

Neighbor-joining trees illustrating genetic relationships between elite soybean cultivars with tree branch color representing geographic origin, China (CN, in red) vs. Europe (EU, in blue), text background color indicating experimental grouping (representing the maturity groups) and outer ring for country of origin. a. Dendrogram of 156 cultivars based on genetic distance in 71 SSR markers. b. Dendrogram of 153 cultivars based on genetic distance estimates using the 200 K SNP array

Fig. 2
figure 2

Scatter plots of principal component analyses (PCA) of Chinese (CN) and European (EU) elite soybean cultivars based on either SSR markers a or the 200 K SNP array b

The two dendrograms of Fig. 1 suggest a slightly larger diversity within the European than within the Chinese set of cultivars, as indicated by longer tree branches for the European cultivars as compared to the Chinese ones. Similarly, the Chinese cultivars are more densely scattered in PCA analysis (Fig. 2) thus suggesting a comparatively higher diversity to be present in European cultivars. Differences in diversity are also reflected by a comparison of average genetic distances within and between experimental groups across the two regions, as indicated in Suppl. Table 5 for SSR-derived differences. For experimental groups 1 and 4, distances are larger within the European than within the Chinese region, while average differences within Chinese experimental groups are slightly higher for groups 2 and 3. As expectable, average genetic distances between experimental groups are highest between the two regions (Suppl. Table 5). A similar pattern was found with SNP-array distances (detail results not shown). All pairwise genetic distances between cultivars calculated either from SSR markers or the SNP-array were also used to examine the concordance between the two marker systems; a clear correlation between the two types of distance measures is illustrating the agreement between marker systems in estimating the distance between genotypes (Suppl. Figure 1). This agreement is even closer when based on average distances within or between European and Chinese experimental groups (r = 0.965, Suppl. Figure 2).

The diversity pattern between genotypes was further investigated by analysis of molecular variance (AMOVA, Tables 1 and 2). In both marker systems, the percentage of variation was higher within populations than between populations both for geographical origin and for maturity groups which indicates the presence of additional factors relevant for variation.

Table 1 Molecular analysis of variance (AMOVA) for geographical origin (CN vs. EU) and soybean maturity groups based on SSR markers
Table 2 AMOVA analysis for geographical origin (CN vs. EU) and soybean maturity groups based on SNP data

Allelic diversity

For the 71 SSR loci, 392 different alleles were detected in total among all 156 cultivars. The number of alleles and the number of private (unique) alleles were both higher in the Chinese than in the European population (Table 3). With respect to maturity group, the number of alleles was highest for maturity groups 0 and I, and it was lower for early maturity groups 000 and 00 as well as for the later maturity group II. An individual description of all SSR loci with their respective number of alleles per locus as well as other parameters of genetic diversity and informativeness is provided in Suppl. Table. 3. While the average number of different alleles per SSR locus was 5.5, a maximum of 11 alleles could be identified in one particular locus. In comparing of SNP-derived population genetic estimators between European and Chinese cultivars, indices were similar except for the index of association (Ia) between the two regions (Table 4).

Table 3 Comparison of population genetic estimators based on SSR markers for geographical origin and maturity group of elite soybean cultivars
Table 4 Comparison of population genetic estimators between European and Chinese elite soybean cultivars based on SNP data

Genetic structure

An analysis of the population structure based on SNP data is presented in Fig. 3. Both for K = 2 and for K = 5 (optimum cross validation error), the separation across the two regions is clearly evident. Chinese cultivar no.s 102 (Heinong 51), 108 (Suinong 24) and 81 (Dongnong 50) were identified to contain a higher proportion of European ancestry; in Fig. 2b, these cultivars were also grouped as PCA outliers of the Chinese cluster located closest to the European cluster. Likewise, the cultivar no.s 40 (NS Kaca) and 11 (Atlanta) were containing the largest proportion of Chinese ancestry among all European cultivars. Moreover, at K = 5, the European sub-population is structured into two ancestral lines (represented by dark blue and green bars), whereas the Chinese sub-population is classified into three ancestral lines (yellow, red and light blue bars). For comparison, an additional structure analysis has been carried out using SSR data (Suppl. Figure 3) for K = 2 and K = 5: Also there, the separation of cultivars according to their region of origin is clearly given at K = 2.

Fig. 3
figure 3

Analysis of population genetic structure of 153 soybean cultivars based on 200 K SNP array data. a Line graph of cross validation (CV) errors of K values for 1–10 with minimum at K = 5. b Population structure for K = 2 (top) and K = 5 (bottom) indicating the proportion of membership of each cultivar in two or five hypothetical subpopulations, respectively

LD decay

The average LD decay across all chromosomes was calculated for the whole cultivar set and separately for the Chinese and European sub-populations. In the European sub-population, LD decay to its half is reached faster (at 110 Kbp distance) than in the Chinese sub-population (135 Kbp).

Discussion

Genetic and allelic diversity

Elite soybeans from the Northeastern spring and Northern spring regions of China and early maturity European elite soybean cultivars are genetically separated from each other as independently revealed by SNP and SSR marker analysis (Figs. 1 and 2). This clearly indicates that the two soybean sets are representing different world soybean populations as proposed by Liu et al. (2020a). In their world soybean classification based on historical dissemination patterns and phylogeographical relationships, soybean populations from the north of China, far east Russia, north of North America and north of Europe (represented by Swedish accessions only) are forming one cluster according to phenotypic data considering daylength- and temperature-sensitive adaptation traits, whereas genetically these populations are clustering differently thus confirming the present results with respect to north Chinese and European cultivars. Similarly, Chinese and American soybean cultivars were also described as having clearly distinct genetic bases (Liu et al. 2017).

The continuous reduction of genetic diversity from soybean domestication through the generation of landraces, ancestors of selective breeding and the subsequent development of present-day elite cultivars is well documented, and several genetic bottlenecks have been identified which caused the loss of rare alleles or larger structural variants (Hyten et al. 2006; Liu et al. 2020b). However, on the level of elite cultivars covered in the present study, a lower level of diversity also would have been assumed for European cultivars as compared to Chinese ones due to the much shorter growing history and lower extend of breeding as well as a rather narrow genetic base in Europe (Tavaud-Pirra et al. 2009). In contrast to this assumption, the level of overall diversity within each the European and the Chinese population appears to be rather similar (Figs. 1, 2, Tables 3, 4) with average genetic distances suggesting slightly higher diversity in Europe for experimental groups 1 and 4 vs. higher diversity in China for groups 2 and 3, respectively (Suppl. Table 5). However, as Chinese cultivars were represented in lower numbers in experimental groups 1 and 4 (Suppl. Table 2) than European ones, this might also add to lower estimates of diversity within these groups here. The finding of a remarkable European diversity is corroborating earlier research (Hahn and Würschum 2014) which identified significant genetic variation existing in Central European soybean germplasm. Moreover, the present results also demonstrate that genetic diversity can be maintained by breeding and selection which was similarly shown for Canadian soybean diversity (Bruce et al. 2019) or for North American ancestors vs. elite cultivars (Hyten et al. 2006).

Within the populations, soybean cultivars were grouped according to their maturity classification. This was particularly evident for the European population, which was additionally grouped according to their country of origin (Fig. 1b). In a comparable set of European cultivars, genotypes also clearly clustered according to maturity group and country of cultivar origin (Žulj Mihaljević et al. 2020). Similarly, time to maturity, geographical location of a breeding program and specific breeding decisions have been identified as major factors affecting diversity of North American public soybeans (Gizlice et al. 1996). Thus, the country-wise grouping (Fig. 1b) probably indicates different breeding programs as well as regional adaptation due to specific environmental conditions between the different European soybean growing regions. Within the Chinese elite population, however, the clustering of cultivars according to the experimental maturity grouping or according to breeding institutions was less clear. This might be due to much larger and more homogenous soybean production regions in China as compared to Europe, and a classification scheme of cultivars into primary ecotypes such as the Northeastern Spring or Northern Spring soybeans (Wang et al. 2006) rather than more narrow maturity groups. In addition, this is also supported by AMOVA results (Tables 1 and 2) indicating a larger percentage of variation attributable to geographic origin rather than to maturity group.

The number of different SSR-alleles was slightly higher in the Chinese than in the European population (Table 3). Remarkably, however, 56 private alleles were found in the European set of cultivars which were not present in the Chinese set. These alleles might have been lost during previous cycles of selection in one region, or they reveal the occurrence of new mutations developing from previous alleles. The considerable number of private alleles found both in the Chinese and European population is also indicating that the populations have been derived from clearly different gene pools with different ancestral lines (Viana et al. 2022). In addition, differences in allele distribution of particular SSR loci between populations might as well indicate signatures of selection for adaptation to particular environments (Tomicic et al. 2015). While the mean number of SSR alleles per locus was 5.5 across the two populations of the present study (Suppl. Table 3) and similar to the sets of Žulj Mihaljević et al. (2020) or Tavaud-Pirra et al. (2009), numbers of alleles were lower in Serbian (Tomicic et al. 2015) and Indian (Kumar et al. 2022) sets of elite cultivars. In contrast, the numbers of alleles were considerably higher in several sets of soybean accessions originating from Korea (Hwang et al. 2020; Lee et al. 2014; Song et al. 2013). Particular SSR loci exhibited a rather high number of alleles (e.g. Satt281 in Suppl. Table 3, also confirmed by Tavaud-Pirra et al. (2009)), whereas other polymorphic loci have two alleles only across the whole population (e.g. locus SacK149 in Suppl. Table 3, for which the two alleles are associated with low or high cadmium (Cd) uptake from soil (Vollmann et al. 2015), and additional alleles would be of interest in view of potential phenotypic effects). In individual SSR loci, differences in numbers of alleles and other parameters between Chinese and European populations were found (Suppl. Table 4) indicating selection effects in the two regions of adaptation.

Genetic structure

The clear separation between Chinese and European cultivars is also confirmed on the level of structure analysis (Fig. 3). For the European cultivars, the existence of two major ancestral lines (dark blue and green bars in Fig. 3, K = 5) was suggested in this analysis. Late maturity European soybeans as represented by Serbian cultivars (Suppl. Table 1) are roughly tracing back to northern U.S. cultivars such as Evans and Hodgson (Hrustic and Miladinovic 2011) which contain ancestral varieties such as Lincoln (pedigree: Mandarin / Manchu) and Richland in their pedigree (Allen and Bhardwaj 1987; SoyBase 2022). As a consequence, the six ancestral varieties Mandarin, Capital, Richland, Lincoln, Strain No. 18 and Mukden have been identified to make about 75% of the parental contribution to south-east European elite cultivars (Tomicic et al. 2015). In contrast, early maturity European soybean cultivars used in central and northern regions of Europe are often tracing back to extremely early germplasm developed in Sweden from germplasm obtained in Sakhalin (Fiskeby, Holmberg varieties) and early maturity Canadian (i.e. Ontario) varieties. Both Canadian and Swiss soybeans such as the widely grown cultivars Maple Arrow (pedigree: Harosoy 63 / Holmberg 840 − 1 − 3) or Ceresia (pedigree: Fiskeby V / Harosoy 107 − 2031 − 2) have been selected for chilling tolerance and adaptation to cool environments (Yamaguchi et al. 2018). Many modern Canadian, Swiss, German, Polish and Austrian cultivars are related through the use of Swedish early maturity germplasm, e.g. introgression of Fiskeby V through Bicentennial into at least nine North American cultivars, or Maple Arrow being present in the pedigree of almost 20 modern cultivars (SoyBase 2022). Therefore, Canadian, Swiss and German germplasm materials have been described as genetically similar (Hahn and Würschum 2014). Thus, the two ancestral lines structuring the European elite cultivar population (Fig. 3, K = 5) might indicate the two different breeding pathways of cultivars from south-east or central-north Europe.

The Chinese elite population is structured into three major ancestral lines which is suggesting a higher level of genetic background variation than for the European population (Fig. 3, K = 5, yellow, red and light blue bars). Despite the fact that the series of Heihe or Dongnong cultivars are earlier and Hefeng or Suinong cultivars are clearly later in maturity, the structure within each of these series is not homogenous. This appears as a clear illustration of a different breeding history between China and Europe: As discussed above, European soybean diversity is largely based on a limited number of distinct plant introductions adapted to small and separated agroecological regions. In contrast, Chinese soybean diversity in the large Northeastern Spring and Northern Spring sowing regions as represented in the present study appears to be more continuous due to multiple ancestral contributions and larger growing areas (Li et al. 2008; Liu et al. 2020b; Wang et al. 2006).

LD decay

Differences in LD decay (Bruce et al. 2019; Contreras-Soto et al. 2017; Viana et al. 2022) may have multiple reasons including size and specifics of population, breeding intensity, or maturity group. The faster LD decay in the European population (Fig. 4) as compared to the Chinese one might again indicate the lower number of ancestors in the European gene pool, because of which relatively more hybridizations per each ancestral line had been carried out thus causing higher rates of recombination; population specific differences in LD decay or greater selection strength in the Chinese population might have contributed to the overall difference as well.

Fig. 4
figure 4

Comparative estimation of linkage disequilibrium (LD) decay with increasing genetic distance from SNP array data for either all or European (EU) and Chinese (CN) elite soybean cultivars

Conclusions

The comparative diversity analysis between Chinese and European elite soybeans has relevant implications for future soybean breeding in both regions. The level of genetic diversity in elite soybean cultivars is of similar magnitude between the two regions which demonstrates that modern plant breeding for specific target environments within geographical regions can maintain overall genetic variation. Due to regional adaptation needs within Europe, cultivars were separated by maturity classification which was less pronounced for Chinese cultivars. Analysis of population structures suggests that European cultivars are based on two major ancestral lines, whereas Chinese cultivars trace back to more ancestral lines pointing to the rich natural soybean diversity present in China.