Molecular characterization and genetic diversity studies of Indian soybean (Glycine max (L.) Merr.) cultivars using SSR markers

Background The genetic base of soybean cultivars in India has been reported to be extremely narrow, due to repeated use of few selected and elite genotypes as parents in the breeding programmes. This ultimately led to the reduction of genetic variability among existing soybean cultivars and stagnation in crop yield. Thus in order to enhance production and productivity of soybean, broadening of genetic base and exploring untapped valuable genetic diversity has become quite indispensable. This could be successfully accomplished through molecular characterization of soybean genotypes using various DNA based markers. Hence, an attempt was made to study the molecular divergence and relatedness among 29 genotypes of soybean using SSR markers. Methods and results A total of 35 SSR primers were deployed to study the genetic divergence among 29 genotypes of soybean. Among them, 14 primer pairs were found to be polymorphic producing a total of 34 polymorphic alleles; and the allele number for each locus ranged from two to four with an average of 2.43 alleles per primer pair. Polymorphic information content (PIC) values of SSRs ranged from 0.064 to 0.689 with an average of 0.331. The dendrogram constructed based on dissimilarity indices clustered the 29 genotypes into two major groups and four sub-groups. Similarly, principal coordinate analysis grouped the genotypes into four major groups that exactly corresponded to the clustering of genotypes among four sub-groups of dendrogram. Besides, the study has reported eight unique and two rare alleles that could be potentially utilized for genetic purity analysis and cultivar identification in soybean. Conclusion In the present investigation, two major clusters were reported and grouping of large number of genotypes in each cluster indicated high degree of genetic resemblance and narrow genetic base among the genotypes used in the study. With respect to the primers used in the study, the values of PIC and other related parameters revealed that the selected SSR markers are moderately informative and could be potentially utilized for diversity analysis of soybean. The clustering pattern of dendrogram constructed based on SSR loci profile displayed good agreement with the cultivar’s pedigree information. High level of genetic similarity observed among the genotypes from the present study necessitates the inclusion of wild relatives, land races and traditional cultivars in future soybean breeding programmes to widen the crop gene pool. Thus, hybridization among diverse gene pool could result in more heterotic combinations ultimately enhancing genetic gain, crop yield and resistance to various stress factors. Supplementary Information The online version contains supplementary material available at 10.1007/s11033-021-07030-4.


Introduction
Soybean (Glycine max (L.) Merr.) is one of the world's most important economic legume crops and second most important oilseed crop of India. It serves as a rich source of oil and protein (20% and 40%) for both human food and animal feed [1,2]. Among the various soybean growing countries in the world, USA stands first in both production and productivity with 96.62 m.t. and 3157 kg/ha respectively from 306.03 lakh ha of area. Whereas, India in the fifth position produces 9.00 m.t. with productivity of 800 kg/ha from 112.5 lakh ha of area [3]. Although during past few decades, the trend observed with respect to production and productivity revealed remarkable and satisfactory increase in other major soybean growing countries, India is still lagging far behind in productivity due to stagnated yields [4]. In India, even though there is momentous increase in the area and production of soybean during last three decades through the adoption of new varieties, the crop yield potential remained static and becoming major concern among researchers [4,5]. This could be attributed to the narrow genetic base of soybean cultivars that is either inherited from the crop genetic architecture (self-pollination) or due to the extensive use of selected genotypes as parental lines in the breeding programmes [6]. The targeted hybridization in consequence, led to genetic uniformity and further shrinks in the genetic base of the soybean germplasm, compromising the yield besides enhancing the susceptibility to several biotic and abiotic stresses [7]. Therefore, understanding the genetic diversity of Indian soybean germplasm is critical to explore the untapped valuable genetic traits for enhancing soybean production and productivity. Even though, genetic similarity may not necessarily turn into immediate epidemic, more divergent genetic background is always a key requisite to ensure defense against the unanticipated outbreak of pests and diseases [4,8].
Diversity in plant genetic resources often enhances the opportunity of plant breeders to breed new and improved cultivars with desirable characteristics [9]. Thus, information on genetic diversity of soybean genotypes could obviously help breeders and geneticists to interpret the germplasm architecture, facilitate the selection of parents with higher levels of diversity, predict superior combinations that deliver best off-spring and accelerate in broadening the genetic base [10]. The assessment of genetic diversity within and between populations is routinely performed based on morphological characterization, biochemical markers and using various molecular marker techniques [11]. Among these, morphological and biochemical markers were identified to be profoundly influenced by environment and several other factors, hence the results are less reproducible with unreliable or biased estimates [12]. Deployment of DNA based marker systems serve as an alternative strategy for precisely discriminating closely related species and cultivars [13,14]. They work by highlighting differences within the nucleotide sequence between different individuals and remain insensitive to environmental factors [15]. Molecular markers can be broadly classified into two groups based on (i) method of detection as (a) non-PCR derived or hybridization-based techniques (RFLP-Restriction Fragment Length Polymorphism, VNTRs-Variable Number of Tandem Repeats) (b) PCR-derived or amplification-based techniques (RAPD-Random Amplified Polymorphic DNA, AFLP-Amplified Fragment Length Polymorphism, STMSs-Sequence Tagged Microsatellites, SCARs-Sequence Characterized Amplified Regions, CAPS-Cleaved Amplified Polymorphic Sequences, SSLPs-Simple Sequence Length Polymorphisms, Microsatellites or SSRs-Simple Sequence Repeats) and (c) sequence-based markers (SNPs-Single Nucleotide Polymorphisms, DArT-Diverse Array Technology), (ii) mode of gene action as (a) dominant marker (RAPD, AFLP etc.) and (b) co-dominant marker (CAPS, SCAR, SSR etc.) [16].
Among the various listed DNA markers-PCR based, SSRs have demonstrated highest rate of polymorphism and have much greater competence in identifying unique alleles among elite soybean germplasm compared to other marker systems [10][11][12]. SSRs contain sequences of short tandem repeats distributed over the genomes that are hyper-variable enabling them as an excellent tool for pedigree analysis, genotype differentiation, evaluation of genetic distances or relatedness among genotypes and varietal identification [17][18][19]. Nevertheless, SNPs have been widely reported as the most abundant class of DNA markers and possess low rates of recurrent mutations that make them evolutionarily stable. They serve as excellent markers for dissecting complex genetic traits and for studying the genomic evolution patterns [20]. In this view, SNPs could serve as an alternative to SSRs for analysis of genetic diversity; however their biallelic nature, low information content and high cost make SSRs still as markers of choice for conducting genetic diversity studies in many crop species [15]. Supporting this, study conducted on comparative genetic diversity analysis using SNPs, DArT and SSRs on sugar beet cultivars revealed that, the success rate was highest for SSR markers owing to their highly polymorphic nature [21,22]. Precisely, many studies documented deployment of SSR markers to be highly productive for estimation of genetic diversity and relationships among soybean genotypes [12,[17][18][19][20][21][22][23][24][25][26][27][28]. However, beyond doubt this might be challenged in the coming future with the development of cheap methods for the assay of SNPs.
Annually, several breeding lines and varieties of soybean are developed through selection and hybridization programmes across the globe. Presently, there are more than 100 extant varieties of soybean cultivated in India. Nevertheless, the success of these high yielding and improved varieties largely relies on the availability of quality seed with greater genetic purity standards [29,30]. The genetic purity of commercial seed lots is traditionally assayed by performing Grow Out Tests (GOT) based on morphological characters that is not only time taking and quite laborious, but also highly environmental responsive [24]. Hence, SSRs are widely deployed for rapid genetic purity assessment and identification of both varieties and hybrids in soybean [25][26][27]. Keeping this in view, the present investigation was carried out with two objectives i) to study the genetic diversity among 29 genotypes of soybean using selected hypervariable polymorphic SSR markers and ii) to explore unique and rare alleles that would be useful for genetic purity analysis and varietal identification of soybean.

Plant material
A total of 29 improved and cultivated genotypes/varieties of soybean were obtained from different breeding centers across India and used in the present study. The varieties selected in this study represent a large range of varieties grown in India and most of them are notified and released for cultivation across different agro-climatic zones of India. Detailed information on pedigree and distinguishable characteristics of all the 29 genotypes are presented in Table 1.

DNA isolation and PCR amplification
Genomic DNA was extracted from seeds using DNeasy® Plant Mini Kit (Qiagen, USA) as per the manufacturer's instructions. The concentration and quality of the DNA samples was estimated using NanoDrop 2000™ spectrophotometer (Thermo Fisher Scientific, USA). All other chemicals used for DNA extraction and amplification were purchased from Sigma-Aldrich, Germany. Finally, all the genomic DNA samples were diluted to a final concentration of 20 ng µL −1 with 1X TE buffer (10 mM Tris-HC1, pH 8.0; 1 mM EDTA) and stored at − 20 °C for further use. Polymerase chain reaction (PCR) amplification was conducted using 25 µL volume mixture containing 1X PCR assay buffer (50 mM KCl, 10 mM Tris-Cl, 1.5 mM MgCl 2 ), 200 µM each of dNTPs, 0.2 µM each of forward and reverse primers, 0.6 U Taq DNA polymerase and 25 ng of genomic DNA. All PCR reactions were carried out in a thermal cycler AG 22331. Thermal profiling was set up with an initial denaturation at 94 °C for 5 min followed by 33 cycles of denaturation (94 °C for 1 min), annealing (55 °C for 1 min), primer extension (72 °C for 2 min) and a final extension step (72 °C for 7 min).
Amplified PCR products were separated by electrophoresis on 3% (w/v) Metaphor™ agarose gel, stained with ethidium bromide (1 mg/mL) and photographed under UV light using Image Lab™ software. The size of the amplified products was determined using 50 bp DNA ladder as size standard. SSR markers developed by Cregan et al. [32] were used in the present study. A total of 35 SSR markers representing all the 20 linkage groups of soybean were chosen for genotyping from SSR database (http:// www. soyba se. org) and presented in supplementary Table 1.

SSR allele scoring and data analysis
The presence or absence of SSR fragment in each genotype was recorded for all the polymorphic SSR primers. Bands appearing without ambiguity were scored as 1 (present) and 0 (absent) for each primer pair. The size of the amplicon was calculated on the basis of band mobility relative to the molecular mass of the ladder. The polymorphic information content (PIC) and expected heterozygosity (H) values reflect the discriminating ability of the marker depending on the number of known alleles and their frequency distribution, thus being alike to genetic diversity; and calculated using the formula given by Botstein et al. [33] in Eq. 1 and Liu [34] in Eq. 2 respectively. p i and p j denote the population frequency of the ith and jth alleles. The first summation is over the total number of alleles, whereas the two subsequent summations denote all the i and j where i ≠ j.
where, p i is the frequency of ith allele in the set of genotypes analysed and calculated for each SSR locus. Effective multiplex ratio (EMR) was calculated as total number of polymorphic loci per primer multiplied by the rate of polymorphic loci from their total number [35,36]. Marker index (MI) is a statistical parameter used to estimate total utility of the maker system. MI, a product of PIC and EMR was calculated as per Powell et al. [35]. Resolving power (R p ) is a parameter used to characterize the ability of the primer combination to detect the differences between a large number of genotypes and was calculated according to Prevost and Wilkinson [37].
Phylogenetic tree was constructed from genotyping data of selected polymorphic SSR markers using DARwin software (version 6.0.21) [38] on the basis of genetic distances. The genetic similarity among genotypes was estimated from the dissimilarity (distance) matrix generated from simple matching coefficient. The resulting dissimilarity matrix was further analysed using the unweighted pair-group method arithmetic average (UPGMA) clustering algorithm for construction of a dendrogram. Similarly, neighbor-joining tree was also constructed based on the dissimilarity matrix using unweighed-neighbor joining algorithm from DARwin software (version 6.0.21) [38]. The robustness of the node of the neighbour-joining tree was assessed from 1000 bootstrap replicates and bootstrap values of > 50% were displayed. Principal Coordinates Analysis (PCoA) is a multidimensional scaling (MDS) method used to explore and visualize similarities or dissimilarities in the dataset. It uses either similarity matrix or dissimilarity matrix obtained from original variables and assigns each variable a specific location in a low-dimensional space. In the present study, PCoA was performed to identify similarity indices between the varieties based on Eucledian distance using Past software (version 4.02) [39].

SSR polymorphism
A total of 29 promising varieties of soybean were analysed in the present study using 35 Table 2.

Genetic diversity and relatedness among genotypes
Cluster analysis was performed to elucidate the relationship among the genotypes and the dendrogram is presented in Fig. 1 A neighbor-joining tree (Fig. 2) displaying the geneticrelationships among soybean genotypes was also constructed based on the alleles detected from 14 SSR markers. The genetic distance-based results seen in the neighbor-joining tree revealed three major clusters, resembling the clusters of UPGMA-based dendrogram. PCoA was also performed to analyze multi-dimensional relationships that describe the proportion of genetic variance in the dataset used based on the similarity indices (Fig. 3). The scatter plot generated from PCoA clustered the 29 genotypes of soybean into four groups based on similarity

Unique alleles
Among the 34 polymorphic alleles identified, eight were detected to be unique alleles generated in specific varieties (Table 3). Satt406 generated unique allele of 100 bp specific to variety PK 472 (Fig. 4)

Rare alleles
As per International Union for the Protection of New Varieties of Plants (UPOV) guidelines, the rare alleles are those present at a specific locus and appear with a frequency below an agreed threshold (commonly 5-10%) and hence they may also be employed in cultivar identification. In the current study, two rare alleles were detected which appeared in two to three varieties. Amlpicon size of 70 bp generated from Satt245 appeared in two varieties Kalitur and Karune, whereas another allele of size 90 bp generated from Satt431 appeared in three varieties viz., Kalitur, NRC 105 and RKS 24.

Discussion
Deployment of SSR markers for assessment of genetic diversity has been widely adopted for screening of soybean germplasm [40,41]. In the present study, 14 markers present on six linkage groups of soybean were found to be polymorphic and the high percentage of polymorphic loci (70.8%) detected were consistent with the previous reports [42,43].  [43] in soybean. However, other studies reported high rates of genetic diversity using SSR markers having 4.9 alleles with average PIC of 0.560 [42], 4 alleles with PIC of 0.580 [46], 4 alleles with PIC of 0.590 [47] and 5 alleles with PIC of 0.610 [48]. All these studies reported comparatively higher number of alleles per locus and average PIC values in comparison to the present study.
Most of the SSR markers (10/14) used in this study had PIC values ≥ 0.3 and one marker Satt440 had PIC value of > 0.6 with highest number of alleles (4) that denotes a strong correlation between PIC and allele richness [49]. Therefore, it is anticipated that allelic richness serves as an effective index for diversity evaluation; nevertheless, it largely relies on the sample size [12]. The moderate level of allelic richness and PIC values observed in the present study could also be attributed to the narrow genetic base of the cultivars used for analysis. In the present study, average heterozygosity of 0.401 was reported that is in agreement with the findings of Zhang et al. [41] and Wang et al. [50] who reported average heterozygosity values of 0.460, 0.446 in vegetable and wild types of soybean respectively. Further, the results of MI, EMR and heterozygosity clearly emphasize that the SSR markers selected for the present study are moderately informative and could be utilized for diversity analysis of soybean genotypes.
Both UPGMA dendrogram and neighbor-joining tree are in similarity with each other and the clustering of genotypes is either based on homology in their origin or similarity in the parental material used for breeding programmes [6,47]. The first cluster of neighbor-joining tree comprised of 14 genotypes resembling cluster II of UPGMA dendrogram. The second cluster comprised of only two genotypes, while the third cluster had 13 genotypes in the neighbor-joining tree. The two genotypes MAUS 71, JS 80-21 that emerged as a separate cluster (cluster II) in neighbor-joining tree were merged within the cluster I of UPGMA dendrogram. However, these two genotypes evolved as a separate sub-group (Ib) under cluster I of UPGMA dendrogram.The pattern of clustering of the remaining genotypes remained same for both UPGMA-dendrogram and neighbor-joining tree. In the present study, all the 29 genotypes of soybean were grouped into two major clusters and various sub-groups in the dendrogram on the basis of their genetic relationships. Similarly, Wang et al. [13] reported two clusters using ten SSR markers, Tantasawat et al. [49] identified four clusters using 11 SSR markers, Ghosh et al. [51] reported two clusters and six sub-clusters from 10 SSR markers, Chauhan et al. [47] obtained two clusters using 21 SSR markers and Hipparagi et al. [45] reported three clusters using 21 SSR markers in soybean.
The genotypes Kalitur, JS 335, JS 97-52, JS 76-205, JS 95-60, JS 20-69, JS 20-34 released from Jabalpur center and NRC 105, NRC 130, NRC 131, NRC 37 developed at Indore center were clustered under the sub-group (Ia) of cluster I of dendrogram on the basis of genetic affinity. Under cluster I, all the varieties developed through hybridization share close affinity with each other due to homology in parental material used for hybridization; and based on the degree of relatedness, the varieties were demarcated into different sub-groups (Ia and Ib) under a single cluster. In agreement to this, two varieties viz., JS 80-21 and MAUS 71 originated from different breeding centers although were grouped under cluster I, they diverged as a separate sub-group (Ib) due to the genetic distances in parental material utilized for hybridization. Interestingly, these two genotypes were demarcated as a separate cluster in both neighbor-joining tree and PCoA grouping.
In case of cluster II, varieties RVS 2001-4, RVS 2001-18, NRC 86, JS 93-05, RKS 24 developed at different breeding centers were clustered under sub-group (IIa) that could be assigned either to homology in parental material used for breeding programme or based on the origin of parent material. Likewise, all the varieties released from Pantnagar viz., Shlajeet, PS 1092, PK 472 and one variety each viz., MAUS 61, NRC 7, Type 49, SL 525 and Indira Soya 9 developed from Parbhani, Indore, Pune, Ludhiana and Raipur centers respectively, clustered under same sub-group (IIb) of cluster II. The findings of this study are supported by the pedigree presented in Table 1. Apart from this, two sets of sister lines viz., JS 95-60, JS 93-05 (developed through selection from PS 73-22) and JS 20-69, JS 20-98 (developed through hybridization from JS 97-52 × SL 710), were grouped into different clusters since they had different genetic profile at these 14 polymorphic loci that clearly reinforce the effectiveness of SSR markers used in the study. The results obtained from the present study clearly demonstrated the potentiality of SSR markers in precise varietal identification, supporting the findings of Chotiyarnwang et al. [52], Tantasawat et al. [49] and Singh et al. [53].
To complement the information obtained from hierarchical cluster analysis, PCoA was performed that again clustered the genotypes into four groups exactly resembling the sub-groups of dendrogram. Comparable to the cluster analysis, PCoA separated the genotypes into four major groups corresponding to the four sub-groups of dendrogram. The PCoA also revealed that most of the soybean genotypes were intermixed into a large group (except MAUS 71, JS 20-81) and exactly corresponding to Ia, Ib, IIa and IIb sub-groups of dendrogram. The results obtained are consistent with the findings of previous reports [12, 13, 42, 47-49, 51, 52, 54, 55]. The results from the present study clearly epitomize that SSR markers could serve as an efficient tool for analysing the genetic diversity among the genotypes and also aid in determining the pedigree relationships in soybean.
The unique or rare alleles generated through natural mutation and selection [56] are often utilized for categorization of germplasm collections, breeding and genetic purity analysis that serve as unique markers [24]. This study reported eight unique alleles amplified from six primer pairs that are specific for the identification of seven varieties and could be potentially utilized for varietal identification and DNA fingerprinting. In congruity to this, Meesang et al. [25] and Zhang et al. [41] have validated the use of SSR markers for genetic purity analysis in different varieties and hybrids of soybean. Analogous to the present study, Tantasawat et al. [49], Sahu et al. [27] and Rani et al. [31] detected unique alleles from their study using SSR markers. Further, the study detected two rare alleles generated from two different markers indistinguishable for identification of 2-3 varieties. Similar results were reported by Rani et al. [31], wherein 11 rare alleles were identified that could potentially identify a set of 2-11 soybean cultivars.

Conclusion
In the present study, the extent of genetic diversity among the investigated genotypes of soybean was reported to be moderate and distributed over two major clusters as evident from the UPGMA dendrogram. The clustering of large number of genotypes in each single cluster indicated high genetic relatedness among the material used. Further, a good association between genetic divergence among the cultivars based on their origin and pedigree has been noticed. The present study also confirms the hypothesis that narrow genetic base exists among the soybean cultivars of India. In addition to this, the study could identify a set of 14 polymorphic markers that could be inadvertently used for diversity analysis of soybean. Besides, the information on unique and rare alleles obtained from the study could be positively utilized for cultivar identification and genetic purity control in soybean. To explore further the diversity of soybean, utilizing of more SSR markers that cover genome/chromosomes of soybean would be desirable for further studies. In summary, the results from this study make it imperative that widening of soybean genetic base is critically essential to exploit heterosis and overcome yield stagnation. This can be achieved by introduction of new alleles into the future soybean breeding programmes of India by inclusion of more landraces, wild relatives and exotic germplasm lines.
included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.