Analysis of genetic diversity in Prunus sibirica L. in inner Mongolia using SCoT molecular markers

Population genetic diversity contributes to the protection and utilization of germplasm resources, especially via genetic breeding. In the present study, start codon targeted polymorphism (SCoT) molecular markers were used to study the genetic diversity of 278 individuals from 10 Prunus sibirica L. populations in Inner Mongolia. A total of 289 polymorphic bands were amplified with 23 SCoT primers, showing a polymorphism percentage of 98.87% and an average of 12.6 polymorphic bands per primer. The SCoT21, SCoT32, and SCoT53 primers amplified up to 17 bands, and the polymorphism percentage was 100%. The minimum number of bands amplified by SCoT25 was 9, and the polymorphism percentage was 90%. Therefore, SCoT molecular markers were shown to be highly polymorphic and suitable for genetic diversity studies of P. sibirica in Inner Mongolia. The analysis of molecular variance showed that 39% of the observed genetic differentiation occurred among populations and 61% occurred within populations, indicating that the genetic differentiation within populations was greater than that among populations. The results of the unweighted pair-group method with an arithmetic cluster analysis, principal coordinate analysis and STRUCTURE analysis were basically the same and divided the 278 individuals from the 10 populations into 2 groups. The results indicated that the efficient SCoT molecular marker-based genetic diversity analysis of P. sibirica in Inner Mongolia can provide a reference for P. sibirica variety breeding and resource development.


Introduction
Prunus sibirica L. (Siberian apricot) is a member of the Rosaceae family and is the dominant species on mountain dunes and dry steppes (Yu 1979). P. sibirica is an important ecological and economic tree species in China, mainly distributed in Inner Mongolia, Heilongjiang, Jilin, Liaoning, Gansu, Hebei, and Shanxi in China, and the P. sibirica forest in Inner Mongolia covers 47.44 hectares.
P. sibirica shows strong cold and drought resistance and high nutritional and medicinal value (Dong et al. 1991). However, the yield differences in P. sibirica in China are significant and very unstable because of long-term seed reproduction, delayed breeding, and the influence of climatic factors. In addition, frost can severely reduce the yield of P. sibirica, greatly restricting its industrialization (Ma et al. 2007;Yao et al. 2007;Jin et al. 2018). P. sibirica shows selfincompatibility, resulting in large genetic differences among individual genetic resources (Li et al. 2011a, b).
Biodiversity refers to all of the variation among living things on earth and is the basis of survival and biological development (Rao et al. 2002). Genetic diversity is an important component of biological diversity and refers to the genetic differentiation of all living organisms (Lu 2018). Dong et al (2018) studied the diversity of 19 quantitative traits of P. sibirica from different populations, and the results showed that the coefficient of variation of the economic traits of fruit and kernel were large, and the coefficient of variation of fruit yield was the largest, indicating that P. sibirica has rich germplasm resources with great potential for breeding good varieties. Wang et al (2019) studied the seed traits of P. sibirica from 19 populations and showed rich genetic variations in seed traits between 19 populations. Li (2014) studied the genetic diversity of P. sibirica and Prunus armeniaca in North China according to inter-simple sequence repeat (ISSR), sequence-related amplified polymorphism (SRAP), and simple sequence repeat (SSR) analyses. The results showed that the natural population of P. sibirica in China exhibits a high level of genetic diversity, showing high genetic differentiation within its populations but low genetic differentiation and moderate gene flow among populations.
The initial SCoT marker was a molecular marker based on a single primer amplification reaction proposed by Collard and Mackill (2009) in rice. It was a novel molecular marker of the target gene. The strategy was to conduct genome amplification with a single primer according to the conservative nature of the ATG translation of the flanking sequences of plant genes, the goal was to reveal the percentage of dominant polymorphic markers in candidate functional gene regions via a procedure with easy operation to identify rich polymorphisms. Compared with random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), SSR, and ISSR markers, SCoT markers have been efficiently produced and linked to traits, making them convenient for use in molecular marker-assisted breeding. SCoT markers have been used in various plant species, such as Phoenix dactyifera (Somayeh Saboori et al. 2020), Mangifera indica (Li Zhou et al. 2020) and Diospyros kaki. (Changfei Guana et al. 2020).
Genetic resources for P. sibirica are extremely abundant and present great developmental potential, and they can provide an important genetic basis for the improvement of Chinese P. sibirica and the breeding of new varieties. In recent years, research on the genetic diversity of P. sibirica has mostly been conducted in northeastern and northern China, and P. sibirica is mainly distributed in Inner Mongolia. Moreover, few studies (Bao et al. 2016)have been conducted on the genetic diversity of P. sibirica in Inner Mongolia. Therefore, we used SCoT molecular markers to analyze the genetic diversity of 278 individuals from 10 populations of P. sibirica in Inner Mongolia. The genetic diversity and genetic structure among populations and individuals of P. sibirica in Inner Mongolia were revealed. This study provides a scientific basis for the cultivation and exploitation of the abundant genetic resources of P. sibirica in Inner Mongolia.

Plant materials
The materials were obtained from the P. sibirica Germplasm Resource Garden of the Inner Mongolia Fine Variety Breeding Center (Fig. 1). A total of 278 individuals were collected from 10 populations (Table 1). Fresh young leaves were placed in a bucket containing liquid nitrogen, brought back to the laboratory, and stored in a -80°C freezer for further experiments.

DNA extraction
Genomic DNA was extracted from fresh leaves using a Plant DNA Extraction Kit (TIANGEN, China). The extracted DNA was tested for quality and purity with a spectrophotometry at a wavelength of 260/280 nm using a BioPhotometer (Thermo Fisher Scientific, America) and 1% agarose gel electrophoresis (with a nucleic acid stain) and stored in a -20°C freezer for further experiments.

SCoT-PCR
The 80 primer sequences were selected by referring to Collard and Mackill (2009) and were synthesized by Shanghai Sangon Biological Engineering Technology and Services. A total of 23 highly polymorphic and repeatable primers were selected from 80 SCoT primers to evaluate the selected P. sibirica accessions ( Table 2).
A 96-well gradient PCR instrument was used for the amplification reactions. PCR was performed in a 20 lL volume containing 10 lL of 2 9 Taq Mix, 1.6 lL (30 ng/lL) of genomic DNA, 0.8 lL (10 molÁL -1 ) of each primer, and 7.6 lL of ddH 2 O. SCoT-PCR amplification was conducted with initial denaturation for 5 min at 95°C, followed by 40 cycles of 45 s at 94°C , 45 s at 50 to 54°C and 2 min at 72°C, with a final extension of 7 min at 72°C and holding at 4°C. The PCR products were separated in a 1.5% agarose gel using 1 9 TBE running buffer, stained, and photographed, and the records were preserved.

Data analysis
SCoT amplicons were scored in a binary matrix as present (1) or absent (0) for each sample based on the corresponding standard size. Vague bands that could not be easily detected were not scored. POPGEN32 software was used to calculate the total number of bands (NPB) and the number of polymorphous bands (PPB) obtained with the SCoT primers to analyze the genetic diversity index of P. sibirica and to calculate Nei's gene diversity index (H), Shannon's information index (I), the number of alleles (Na), the effective number of alleles (Ne), total genetic diversity (Ht), population genetic diversity (Hs), the coefficient of  genetic differentiation (Gst), and gene flow (Nm). Ntsys-2.0 software was used to calculate Nei's genetic distances from the binary (0, 1) matrices obtained, and the unweighted pair-group method with arithmetic means (UPGMA) method performed in the SHAN program was used for the cluster analysis of the 10 populations to construct the cluster relationship tree diagram. MEGA7 software was used for the cluster analysis of the 278 individuals based on Nei's genetic similarity matrix obtained via the arithmetic mean (UPGMA) method. An analysis of molecular variance (AMOVA) and GenAlEx software were used for the analysis of genetic differentiation among and within populations. A principal coordinate analysis (PCoA) was performed according to the binary matrix using GenAlEx software. STRUCTURE software was used to determine the genetic structure of the studied population, and K was tested from 1 to 10 with ten replicates. IBD software was used for Mantel test to analyze the correlation between genetic distance and geographic distance and altitude.

SCoT polymorphism analysis
In the SCoT analysis, 23 SCoT primers were used for marker amplification in 278 P. sibirica individuals. A total of 292 bands that could be scored were produced, among which 289 bands were polymorphic, with a mean of 12.6 polymorphic bands per primer. The polymorphism percentage was 98.87%. Among the 23 SCoT primers, SCoT21, SCoT32, and SCoT53 amplified a maximum of 17 bands, and the polymorphism percentage was 100%. The number of bands amplified by the SCoT 25 primer was at least 9, and the polymorphism percentage was 90%. The 23 SCoT primers produced Na values ranging from 1.90 to 2.00, with a mean of 1.99; Ne values ranging from 1.28 to 1.60, with a mean of 1.44; H values ranging from 0.18to 0.34, with a mean of 0.26; and I values ranging from 0.30 to 0.50, with a mean of 0.41 (Table 3).  (Table 5). The AMOVA showed 9 degrees of freedom among the populations and 267 degrees of freedom within the populations. It showed that the mean squares of the variance among the populations was 4,622.89, and that within the populations was 7,385.47. The among-population variance was 17.61 on average and the withinpopulation variance was 27.56. Data from the AMOVA molecular detection prove that 39% of total genetic variance occurred among the populations, while 61% occurred for within the population genetic  (Table 6).This result indicates that there is low genetic differentiation among populations.
Analysis of the among-population genetic structure POPGEN32 was used to analyze the genetic distance and genetic similarity of the 10 P. sibirica populations in Inner Mongolia. The analyses of Nei's genetic similarity and Nei's genetic distance showed that the population genetic distance ranged from 0.03 to 0.25 while the genetic similarity ranged from 0.77 to 0.97. The genetic similarity between WJG and KSK was the highest, at 0.97, and their genetic distance was the lowest, at 0.03. The lowest genetic similarity between ZLT and WJG was 0.77, and the greatest genetic distance was 0.25 (Table 7). Based on Nei's genetic distance, the UPGMA method was used for cluster analysis. The results showed that the 10 populations were divided into two major groups at a genetic distance threshold of 0.22 (Fig. 2)

Analysis of genetic structure within populations
Based on Nei's genetic similarity, MEGA7 software was used to perform the clustering analysis of 278 The total genetic diversity (Ht), population genetic diversity (Hs), the coefficient of genetic differentiation (Gst), gene flow (Nm)  (Fig. 3), with the PCoA performed using GenAlEx software (Fig. 4) The PCoA results showed that the 278 P. sibirica individuals were divided into groups I and II. These results were relative to those of the UPGMA clustering analysis of similarity. The two groups were divided  To further explore the genetic relationships within populations of P. sibirica, the population structure of 278 P. sibirica individuals was evaluated using STRUCTURE version 2.3.4 software (Beaumont et al. 2001). DK values computed for all classes indicated a strong signal for K = 2 (Fig. 5), and K = 2 values provided the most rational arrangement of P. sibirica in different regions. A total of 278 P. sibirica individuals were divided by the STRUCTURE analysis into 2 groups (Fig. 6). Group I included all individuals in the KSK, WJG, KZH, and KYZ

Discussion
In this study, a genetic diversity analysis was performed using SCoT markers to assess phylogenetic relationships among 278 P. sibirica individuals from 10 different populations in Inner Mongolia. Twenty- three SCoT primers were used for the amplification of genomic DNA from P. sibirica. A total of 292 clear bands were obtained, 289 of which were polymorphic. The average number of amplified bands per primer was 12.6, with a polymorphism percentage of 98.87%. In a previous study, researchers used SRAP (Ai et al. 2011), AFLP (Wang 2008), RAPD (Lu 2008), and ISSR (Liu et al. 2011(Liu et al. , 2007Duan et al. 2010;Li et al. 2009) molecular markers to analyze the genetic diversity of P. sibirica in different regions, and the results showed that the polymorphism ranged from 58 to 90%. In contrast, the SCoT molecular markers were found to be more polymorphic than other molecular markers. Therefore, SCoT markers may be suitable for the study of genetic diversity according to molecular markers in P. sibirica. A previous study of the genetic diversity of Prunus armeniaca (Li 2014)  .94%, respectively. We showed that the genetic diversity of the P. sibirica populations in Inner Mongolia is higher than that of Prunus armeniaca and P. sibirica. This result was consistent with the conclusions of other scholars that the genetic diversity of outbred species is higher (Hamrick 1989;Zheng et al. 2008).
As a perennial wild resource, P. sibirica is widely distributed in Inner Mongolia and exhibits a large distribution area, long-term evolution and considerable diffusion as well as ecological diversity, leading to rich genetic diversity. According to Wright (1972), when the F ST (G ST , genetic differentiation coefficient) value is 0 to 0.05, the genetic differentiation within populations is low; when the F ST value is 0.05 to 0.15, the genetic differentiation within populations is moderate; when the F ST value is 0.15 to 0.25, the genetic differentiation within populations is high; and when the F ST value is greater than 0.25, the genetic differentiation within populations is great. In our research, the G ST was 0.36. The results showed that the genetic differentiation coefficient of P. sibirica within populations in Inner Mongolia was relatively high, with among-population genetic differentiation of 39% and a within-population genetic differentiation of 61%. Wang (2019), Liu et al. (2012) and Ma (2013), who used SSR and ISSR markers for P. sibirica and wild apricot genetic diversity analyses, showed that within-population variation dominated, and the among-population variation was much lower than the within-population variation. which is similar to the results of a previous study in P. sibirica.
The reasons for the high within-population genetic differentiation of P. sibirica in Inner Mongolia may include the following: 1. Gene flow is an important factor affecting genetic differentiation in a population when Nm \ 1, which can conversely effectively prevent genetic differentiation caused by genetic drift. The Nm value for P. sibirica in Inner Mongolia was 0.89, indicating that genetic drift was not the main factor influencing the genetic differentiation of the P. sibirica populations in Inner Mongolia. Some gene flow exists among the P. sibirica population in Inner Mongolia, but the Nm intensity was relatively low. 2. It the main breeding method of P. sibirica is outcrossing, self-incompatibility will refuse self-pollination or inbreeding, and its breeding mainly relies on natural pollination or pollination by visiting insects such as bees, flies and a few butterflies (Liu et al. 2010;Liu 2010). Although pollen can be spread by insects and by wind over long distances, the large distribution area and discontinuities in the population distribution of P. sibirica are restricted due to habitat fragmentation (Wu et al. 2015), thus weakening the gene flow within populations of P. sibirica. 3. Some populations have been destroyed or their habitats have been degraded, resulting in a gradual fragmentation of the population distribution and limiting the gene flow within the populations. Therefore, P. sibirica in Inner Mongolia shows high within-population genetic differentiation, and the topography and landforms and human activtes could be potential factors influencing the genetic difference.
In the present study, the genetic structure of P. sibirica populations in Inner Mongolia was analyzed. From the clustering diagram, principal component analysis, and structure analysis, showed that most of the populations are clustered together with a similar geographic distribution, and the Mantel test showed a not significant correlation between the genetic distance and geographic distance and between the genetic distance and altitude of the P. sibirica studied (r = 0.04, P^0.48; r = 0.03, P^0.43). Therefore, the genetic difference among P. sibirica is not due to their geographical distance and altitude. The clustering diagram, principal component analysis and structure analysis of the 278 individuals divided the populations into two groups ( I and II). KSK, WJG, KYZ, and KZH were clustered in group I; LC, BLY, WLS, AL, and ZLT were clustered in group II; and only the 22 individuals in the ZLA group were divided between the two groups. A certain degree of gene flow exists among the subgroups, and gene flow also affects the population structure of P. sibirica (Liu et al 2012). This may occur because of the area of many sandy land in Inner Mongolia, and the topography and landforms are very complex. Thus, weakening the gene flow opportunities among populations. Second, habitat fragmentation is also an important factor that affects the composition and structure of the ecosystem and the genetic structure of a species.

Conclusion
This study revealed that P. sibirica shows high genetic diversity in Inner Mongolia. Therefore, improving the preservation and utilization of specific germplasms of different populations is necessary to avoid the disappearance of a large number of valuable genetic resources. Moreover, expanding the collection of resources to implement transfer protection will be conducive to better scientific research and resource protection in P. sibirica, increase the gene flow among populations, and improve the genetic diversity of P. sibirica. Based on resource collection, genetic diversity analyses, and genetic structure analyses, the breeding of P. sibirica varieties with high yields and frost resistance is being actively performed to meet the increasing demand for the industrial production of P. sibirica resources.