Genetic diversity and population structure of Robinia pseudoacacia from six improved variety bases in China as revealed by simple sequence repeat markers

Robinia pseudoacacia is an important afforestation tree introduced to China in 1878. In the present study, we examined the genetic diversity among 687 strains representing four improved varieties and two secondary provenances, comprising 641 clones and 46 seedlings. Ninety-one simple sequence repeats (SSRs) were selected through segregation analysis and polymorphism characterization, and all sampled individuals were genotyped using well-characterized SSR markers. After excluding loci with non-neutral equilibrium, missing locus data and null alleles, we used 36 primer pairs to assess the genetic diversity of these germplasm resources, revealing vast genetic differentiation among the samples, with an average of 8.352 alleles per locus and a mean Shannon′s index of 1.302. At the population level, the partitioning of variability was assessed using analysis of molecular variance, which revealed 93% and 7% variation within and among collection sites, respectively. Four clusters were detected using structure analysis, indicating a degree of genetic differentiation among the six populations. Insights into the genetic diversity and structure of R. pseudoacacia provide a theoretical basis for the conservation, breeding and sustainable development in China.


Introduction
The persistence and evolutionary potential of species depend on genetic diversity (Linda et al. 2010), which increases as the population size increases because larger populations contain more potential diversity, while small populations tend to lose genetic diversity over time through random events (genetic drift processes) (Frankham et al. 2010;Leimu et al. 2006;Fant et al. 2014). As a result, populations with more Abstract Robinia pseudoacacia is an important afforestation tree introduced to China in 1878. In the present study, we examined the genetic diversity among 687 strains representing four improved varieties and two secondary provenances, comprising 641 clones and 46 seedlings. Ninetyone simple sequence repeats (SSRs) were selected through segregation analysis and polymorphism characterization, and all sampled individuals were genotyped using wellcharacterized SSR markers. After excluding loci with nonneutral equilibrium, missing locus data and null alleles, we used 36 primer pairs to assess the genetic diversity of these 1 3 individuals usually have greater genetic diversity (Basey et al. 2015). In addition, genetic diversity can alleviate rapid anthropogenic environmental changes (Hughes et al. 2004;Jump et al. 2009). The hereditary basis of a breeding population determines breeding quality and potential after longterm natural or artificial selection (Jin et al. 2016). Therefore, understanding the genetic diversity of plant resources provides opportunities for breeders to develop new and improved cultivars with desirable characteristics and is also important for the maintenance of germplasm collections (Govindaraj et al. 2015;Duan et al. 2017).
Black locust, originating in North America, is an important deciduous tree, that has been planted extensively in 27 provinces in China (Tian et al. 2003;Qiu et al. 2010;Zhang et al. 2015) and has been naturalized in this country since 1878. It possesses numerous useful characteristics, including a high relative growth rate and the production of a large biomass of high-density wood that is easy to dry and process and combusts well, making it one of the most economically important tree species in China (Rédei et al. 2008;Kropf et al. 2010;Benesperi et al. 2012;Boring and Swank 1984). The plant is stress-resistant; it has evolved to withstand drought and saline-alkaline soils and can grow well in barren soil. Therefore, it is an ideal choice for afforestation in northwest China. At the same time, it is a strong nitrogenfixer and thus has potential for improving biodiversity due to its root nodules, which contain symbiotic nitrogen-fixing bacteria and parasitic tissues that allow root uptake of nitrogenous compounds (Rice et al. 2004;Nicolescu et al. 2018;Xu et al. 2019), and thus enhances plant growth, even in less-fertile soils. Unsurprisingly, demand for these valuable trees has increased.
Since the 1980s, toward fully exploring and exploiting the potential genetic capacity of R. pseudoacacia, phenotypic determination methods have been used to evaluate black locust throughout China, including the dominant wood comparative method, standard land method, index method, statistical test method, and scoring method (Shu 1988;Gu et al. 1990;Zhang et al. 1990;Xun et al. 2009). The selection and collection of high-quality trees has led to the development of a seed orchard and improvement of the genetic quality of seeds available for artificial afforestation. That work was carried out through the cooperation of three divisions of forestry departments (city, county, farm) to construct breeding archives and monitor disease periodically to prevent damage (Hunt et al. 2004;Li et al. 2014;Yin et al. 2014;Zhao et al. 2014). Subsequently, the selected highquality trees were used to establish large-scale seed orchards using clones with provenance in four provinces (Shandong, Gansu, Liaoning, and Shanxi) in China. Ex situ conservation of germplasm resources not only provides the basis for reintroduction but also is an important supplemental measure for in situ conservation (Ramsay et al. 2000;Li et al. 2018a, b). These high-quality trees were used in traditional breeding research, including the creation of new cultivars based on hybridization, estimation of volume growth, development and anatomy of the floral nectary, and studies of black locust resistance to physiological and biochemical stresses (Jiang et al. 2015;Zhang et al. 2016;Li et al. 2018a, b;Han et al. 2019;Wang et al. 2019). In addition, in several studies, black locust genetic diversity has been evaluated using molecular markers (Yin et al. 2014;Mao et al. 2017;Li et al. 2019;Dong et al. 2019a, b); however, small sample sizes and different types of markers were used. Therefore, they cannot be used to assess whether high-quality trees are used effectively in different orchards.
In the present study, we used 36 SSR loci to obtain genotyping data of R. pseudoacacia germplasm resources from orchards in China by nondenaturing polyacrylamide gel electrophoresis (PAGE). Their genetic diversity and structure were evaluated and systematically described. In addition, gene flow among orchards was revealed, and migrants and admixed individuals were identified. This information may be useful for the genetic differentiation of black locust in China and aid the conservation and breeding of this species.
Total DNA was isolated from all black locust samples through drying and extraction with DP320 Plant Genomic DNA kits (Tiangen, Beijing, China). After extraction, the integrity of the genomic DNA was determined using 1% agarose gel, and the DNA purity and quantity were measured using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). Each DNA sample was diluted to an equal concentration of 20 ng/μL with TB elution buffer (Tiangen), labeled quickly, and placed in a quick-freeze box for storage at − 80 °C until they were used for SSR amplification.

Microsatellite marker screening and SSR genotyping
Ninety-one pairs of SSR primers were used to screen eight different phenotypes and sources of black locust (Guo et al. 2018). The SSR primers were synthesized by TSINGKE Biological Technology Co., Ltd. (Beijing, China). PCR amplification system and procedure were reported previously by Guo et al. (Guo et al. 2017). Briefly, the 20-μL reaction mixture included 2 μL (20 ng/μL) genomic DNA, 1 μL (10 μM) each reverse and forward primer, 10 μL 2 × PCR Master Mix (blue) (TSINGKE, Beijing, China), and 6 μL ddH 2 O. PCR amplification was carried out using a Bio-Rad T100 thermal cycler (Hercules, CA, USA), with the described amplification protocol by Guo et al. (2017) and Schuelke (Schuelke 2000). Subsequently, the PCR products were resolved through 6% non-denaturing PAGE and visualized by silver staining and ultraviolet light also described by Guo et al. (Guo et al. 2018). After screening, the SSR markers rated as excellent were used for the amplification and analysis of all 687 black locust clones. The same 20-μL reaction volume was used for population SSR amplification, and the PCR was carried out using the Touchdown program described by Guo et al. (Guo et al. 2018). The PCR products were then stored in a freezer at − 20 °C until retrieved the same afternoon (Ruibio BioTech Co., Ltd., Beijing, China) and separated using an ABI 3730XL DNA capillary electrophoresis analyzer (Applied Biosystems, Foster, CA, USA).

Data analysis Genetic diversity
Allele sizes of SSRs were converted into various formats for further analysis using Convert ver. 1.3.1 (Glaubitz 2004). Estimation of parameters of genetic diversity including the number of different alleles (N a ), effective number of alleles (N e ), observed heterozygosity (H o ), expected heterozygosity (H e ) and Shannon's information index (I) for each SSR locus in four populations were calculated using GenAlEx ver. 6.501 (Peakall and Smouse 2005;Rod and Smouse 2012). The inbreeding coefficient (F IS ), Wright′s fixation index (F IT ), and the fixation index (F ST ) for each microsatellite locus and population were estimated using FSTAT ver. 2.9.3 (Goudet 2001). Polymorphism information content (PIC) and gene diversity (H) were calculated using PowerMarker ver. 3.25 (Liu and Muse 2005). Whether the population was at Hardy-Weinberg equilibrium (HWE) was assessed based on Markov chain iterations and a neutrality test for all loci using Arlequin ver. 3.5 (Excoffier and Lischer 2010). Subsequently, null alleles were detected using Micro-Checker ver. 2.2.3 (Oosterhout et al. 2004). GENECAP ver. 1.4 was employed to detect samples with identical genotypes (Wilberg and Dreher 2004).

Differentiation analysis
An analysis of molecular variance (AMOVA) was performed in GenAlEx ver. 6.501 to partition the total genetic variation among and within populations, and the number of private alleles for each population (PA) and Nei's genetic distance were also estimated (Rod and Smouse 2012;Peakall and Smouse 2005)). The pairwise genetic distances [F ST /(1 − F ST )] (Rousset 1997) between pairs of provinces were also estimated.

Population structure
The Bayesian model-based clustering algorithm in the software package structure ver. 2.2.2 was used to estimate the population genetic structure based on the screened SSR markers (Pritchard et al. 2000). In this study, for each value of K (K = 1-20), 10 independent runs were performed with a burn-in period of 100,000 iterations and 1,000,000 Markov chain Monte Carlo replications, with the remaining parameters set to their default values. The online program Structure Harvester (Earl and Vonholdt 2012) was used to determine the optimal value of K based on the ΔK model developed by Evanno et al. (Evanno et al. 2005). To verify the K values, the calculations were repeated 10 times from K min = 1 to K max = 20 using the default parameters in Maverick software (Verity and Nichols 2016). The programs Clumpak ver. 1.1.2 (Kopelman et al. 2015) and distruct ver. 1.1 (Rosenberg 2003) were used to create the bar plot of the probability of membership based on the Q-matrix results.

Principal coordinates and Mentel′s analysis
Principal coordinate analysis (PCoA) based on genotype data of SSR markers were used to examine differences among and within populations and geographic regions using GenAlEx ver. 6.501.
Mantel′s test of the genetic and geographic distances in the clone populations was performed using GenAlEx ver. 6.501 (Peakall and Smouse 2005;Rod and Smouse 2012), with geographic distance depending on the measured latitude.

Selection of SSR markers for black locust populations in China
Forty-eight SSR markers were successfully amplified and passed the primer-screening process. We analyzed the primers before evaluating the genetic diversity among 687 samples of black locust from five provinces. genecap ver. 1.4 was used to estimate the paired genotypes of all individuals, and there were no pairs of samples with identical genotypes. Four markers had more than 20% missing locus data. Two loci deviated significantly from predicted results from a neutral equilibrium model based on the Ewen-Watterson test. Additionally, null alleles were detected for nine markers among the 48 SSR loci. Finally, 12 pairs of SSR markers were deleted, and the remaining 36 markers were used for further analysis of black locust clonal populations in China (Table S1).

Genetic diversity of black locust in six improved variety bases
A total of 587 alleles were detected at 36 SSR loci in 687 samples, with the number of alleles per locus ranging from 8 (at locus    (Table 1). At the population level, Shandong had the largest numbers of different and private alleles (Table S2) among all populations, probably due to the larger sample size (Glaubitz 2004). Correspondingly, Mengjin and Minquan had the fewest individual alleles, and also the fewest different and private alleles. The highest I (1.504), H (0.659), and PIC (0.628) values were observed in the Shandong population, indicating rich genetic diversity. The second most diverse population was that of Shanxi, with I, H, and PIC values 1.393, 0.627, and 0.593, respectively. The H o of the total population was smaller than the H e , and the F IS was consistently positive, ranging from 0.053 (Liaoning) to 0.151 (Shandong), with an average of 0.091 (Table S3).

Population structure analysis
A Bayesian model implemented in structure was used to assess black locust population structure. The highest ΔK value was detected at K = 4, which represented the most probable number of clusters (Fig. 1). Furthermore, the K-value was confirmed using Maverick software, which also determined that K = 4 was the optimal value, indicating that all individuals should be divided into four clusters in this study (Fig. 2). From the results of 10 independent runs in structure with K = 4, the major mode accurately produced identical patterns of individual assignments for each run (Fig. S2). Cluster 2 contained the largest number of individuals (n = 289), followed by Cluster 3 (n = 182) and Cluster 1 (n = 118). Cluster 4 contained the fewest individuals (n = 98). The average Q-values were similar and were calculated to be 0.841 (Cluster 1), 0.912 (Cluster 2), 0.859 (Cluster 3), and 0.917 (Cluster 4). Among them, individuals from Shandong were mainly distributed in Cluster 1 (n = 107, 32.23%) and Cluster 3 (n = 179, 53.92%), whereas individuals from Gansu (n = 144, 95.36%), Liaoning (n = 61, 98.39%), Mengjin (n = 17, 73.91%), and Minquan (n = 21, 91.30%) were mainly distributed in Cluster 2. By contrast, all individuals in Shanxi (n = 96) were placed in Cluster 4, indicating the presence of a strong population structure (Table S4). In addition, the results of genetic structure analysis with multiple K values (Fig. 3) and PCoA based on unweighted genetic distances were also consistent (Fig. 4), indicating a degree of genetic differentiation among the six populations of black locust in China.

Population differentiation analysis
The degree of genetic differentiation (F ST ) between any two populations was calculated for all six populations (Table 2). All pairwise F ST combinations were significant (P < 0.05), with an overall F ST value of 0.028 (< 0.05). Among all populations, F ST values for global and pairwise multilocus analysis ranged from 0.008 (Liaoning vs. Gansu) to 0.053 (Shanxi vs. Minquan).
AMOVA was used to evaluate components of the variance among and within populations, as well as among and within clusters. The results revealed that variation was low 1 3 among populations (7%) and clusters (9%) and high within populations (93%) and clusters (91%) ( Table 3).
Nei′s unbiased genetic distance (D) and genetic consistency were determined between pairs of populations. Among all pairs, Liaoning and Gansu exhibited the smallest D (0.010) and the largest genetic identity (0.990). By contrast, Minquan and Shanxi exhibited the largest D (0.158) and smallest genetic identity (0.854) ( Table 4). In addition, Mantel′s test using data from the six populations revealed that there was no correlation between genetic distance and geographic distance (Fig. 5).

Discussion
To establish suitable breeding, conservation, and management strategies, we need to collect and identify germplasm resources and study their genetic differences. To date, such research on the genetic diversity in R. pseudoacacia in China has been limited; in particular, the levels of genetic diversity among R. pseudoacacia populations and the types of genetic resources available has been unclear. For these reasons, 687 individuals representing four known high-quality varieties and two secondary provenances were collected, comprising 641 clones and 46 seedlings. A set of neutral SSR markers was obtained via a series of screening procedures and used to assess germplasm diversity. Additionally, individuals collected from different regions may have the same genotype, so GeneCap ver. 1.4 was used to analyze all strains. Of the 687 samples, no individuals shared the same genotype, thus indicating that all samples could be used for subsequent analysis.
In the present study, the average Shannon′s information index value (I) within 687 black locust genotypes by 36 neutral SSR markers was 1.302. The value was higher than that of the R. pseudoacacia samples from 10 main planting districts in China using AFLP and ISSR markers (Huo et al. 2009;Sun et al. 2009). The higher I value may be related to use the different types of molecular markers and/or the sources of materials (Lu et al. 2020;Xiong et al. 2019). Sequentially, 12 EST-SSR markers to evaluate 123 black locust cultivars in China were compared with those in our study, showing lower values for N a , N e and I in our populations (Dong et al. 2019a, b). The high number of the above parameters observed in this study may be due to the simultaneous use of two types of SSR markers and the relatively  large sample size compared with a population growing at the geographic origin of black locust (Guo et al. 2018). In addition, values of genetic diversity indices including N a , I, PIC, and H in the present study were higher than in those from previous research, which may be due to the large number of sequences analyzed. Species with higher genetic diversity may have stronger adaptability to a changing environment, and such adaptability may also reflect the ability of a particular genotype to tolerate multiple conditions (Schaal et al. 1991;Hawtin et al. 1996). Although black locust is an exotic tree species, the genetic diversity of this species′ germplasm resources in China and that of native populations in the United States are similar. Furthermore, the analysis of variance using SPSS ver. 24 revealed that there was no significant difference in the level of diversity between the two groups (Guo et al. 2018). For these reasons, black locust is widely distributed and exhibits strong adaptability in China (Table S5). H e is a critical measurement of genetic diversity. In this study, the average observed H o and H e were 0.551 and 0.608, respectively. Similar results were obtained by Lian et al. (2002) (H o = 0.615, H e = 0.773) and Mishima et al. (Mishima et al. 2009) (H o = 0.661, H e = 0.739) using SSR markers. In addition, at both the locus and population levels, we found that most H e values were higher than the H o values, indicating that heterozygote deficiency is possible. Furthermore, the F IS value was 0.100 at the locus level and 0.091 at the population level, suggesting that self-pollination or the Wahlund effect may occur in this species (Selkoe and Toonen 2006).
F ST is usually used to measure the degree of genetic differentiation between populations. All pairwise comparisons of F ST indicated that the differences were significant, and most (80%) were less than 0.05 (Table 3), implying that the genetic differentiation among populations was notable but the degree of differentiation was low, which is consistent with the findings of Guo et al. (Guo et al. 2018). Moreover, AMOVA results revealed that there was only 7% genetic variation among populations and 9% genetic variation among clusters. These results indicate that the observed variation was mainly attributable to within-population and within-cluster differences. The long growth cycle and outcrossing dominance of woody plants may lead to an increase in genetic diversity among individuals and a reduction in differences among populations and clusters ). In addition, the analysis of Nei's unbiased genetic distance (D) and identity showed that there was genetic differentiation among populations, indicating that the variation among populations may not be related to their geographic distribution. Furthermore, the Mantel test indicated that in the populations sampled, genetic variation was not significantly correlated with geographic distance. These results are consistent with those previously reported (Guo et al. 2018;Sun et al. 2009;Hamrick et al. 1989;Yang et al. 2004).
In the population structure analysis using structure and Maverick software, four clusters were identified within the 687 black locust samples. Moreover, genetic structure analysis using multiple K values and PCoA produced results consistent with those of the tests described above. Interestingly, we found that samples from Shandong Province were assigned mainly to two clusters (Clusters 1 and 3). Additionally, genetic diversity analysis demonstrated that the diversity and NA are generally higher in populations from Shandong Province. Samples from Gansu, Liaoning, Mengjin, and Minquan were mainly grouped into Cluster 3. Based on these results, we propose that individuals within the same cluster share a common origin (i.e., kinship), whereas their relationships with individuals in other clusters are more distant. Despite the loss of information about the geographic origin, introduction, and planting of the sampled trees, we could roughly determine the source of each individual within a group of black locust; for instance, the genetic relationship between trees from Shanxi and those from the other five populations is distant, whereas the genetic relationships of strains sampled in Gansu, Liaoning, Mengjin, and Minquan are closer. Taken together, it is the first broad analysis of the population genetic diversity and differences in an extensive collection of black locust germplasm resources sampled from multiple populations in China. Our results will help with elucidating the genetic relationships among populations of this species that lack introduction information. In addition, this research also provides valuable genetic information for the breeding and conservation of black locust in China.

Conclusion
In this study, we conducted the first evaluation of black locust germplasm resources in China. We (1) identified a large number of germplasm resources representing several varieties in China, (2) found that the populations contain high genetic diversity, (3) observed that heterozygote deficiency existed in all populations, and (4) determined that geographic distance is not the main driver of black locust genetic structure. These results provide comprehensive and important information for the breeding of black locust resources in China.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.