Background

Palaemonetes sinensis (Sollaud, 1911), also known as Chinese grass shrimp, a small freshwater shrimp belonging to the Palaemonidae family distributed in China, Myanmar, Vietnam, Japan, southeastern Siberia and Sakhalin and has both ecological and ornamental value [1,2,3]. In China, P. sinensis is commonly distributed in the Liaoning, Jilin, Heilongjiang, Hebei, Jiangsu, Fujian and Yunnan provinces [1]. P. sinensis is an attractive shrimp due to its striking appearance, appealing flavor [4] and use as bait for sport fishing [3]. Due to its delicious meat and high nutritional value, P. sinensis is very popular in both domestic and foreign markets [3,4,5,6]. However, as a consequence of environmental pollution and overharvesting, the P. sinensis population has gradually diminished. As such, there has been interest in P. sinensis aquaculture to potentially alleviate fishing pressure on the wild population by meeting consumer demand with farmed shrimp and through stock enhancement. Nevertheless, there are only a handful of studies investigating P. sinensis morphology and more work is needed to understand the biology of this species [3, 6]. At present, studies on the P. sinensis using molecular biology are rarely reported, only some studies have used microsatellite markers in other species of the Palaemonidae family, including Macrobrachium rosenbergii [7] and Macrobrachium nipponense [8]. In the past decade, microsatellite markers have been widely used in the genetic background research [9], and have been considered as very important molecular genetic markers for the analysis of genetic diversity, genetic structure and construction of genetic linkage maps. With the development of sequencing technology, vast quantities of transcriptome information were obtained, including of the identification of microsatellite markers. Ma et al. [10] identified 129 polymorphic microsatellite markers from transcriptomic analysis and analyzed their relationship to growth performance in Mud Crab (Scylla paramamosain). Research in related species of snapping shrimps with highly duplicated genomes proved that microsatellites acquired from the transcriptome were more likely to work effectively than those developed through traditional methods from the genome [11]. However, compared to other aspects of massive transcriptome data, the roles of transcriptome-derived microsatellite markers have not been investigated thoroughly. In prior work, a total of 17,019 microsatellite markers were obtained from the transcriptome of the P. sinensis [4]. Moreover, there are no studies investigating molecular marker based genetic diversity of P. sinensis, which has the potential to aid in the conservation and improvement of this shrimp species. In view of the broad distribution of its wild populations, the questions over whether they are genetically similar or do exhibit differences attributable to their locations or drainage basins are still unsolved. The objectives of the current study were to validate variations of transcriptome-derived microsatellite markers and to analyze their underlying genetic background of different wild P. sinensis populations. It is hoped that this work could make a positive contribution to the molecular genetic analyses of P. sinensis populations and eventually serve as a basis for the improvement and sustainable conservation of P. sinensis aquaculture in China.

Results

Polymorphisms of microsatellite loci

Seven wild populations of P. sinensis, totaling 319 individual shrimps (Fig. 1, Table 1), were screened for 16 microsatellite loci which were polymorphic in all populations using the 0.95 allele frequency criterion. The characteristics of the 16 microsatellite loci were summarized in Table 2. The most polymorphic locus was c1747_g1_i1 among the 16 loci, with the highest NA (13), NE (3.628), He (0.724) and Ar (7.478), and c2591_g1_i1 was the least polymorphic locus, with the lowest NA (2), NE (1.006), He (0.006) and Ar (1.213) values.

Fig. 1
figure 1

Map of P. sinensis samples collection sites. (see Table 1 for full description of the populations)

Table 1 List of populations used in this study and their geographic position of P. sinensis
Table 2 Characteristics of the microsatellite markers and their genetic variation statistics in P. sinensis

Hardy-Weinberg equilibrium test and linkage disequilibrium

Nine of the sixteen loci showed a highly significant departure from Hardy-Weinberg equilibrium (HWE) (P < 0.01), whereas the other seven loci showed no significant differences (Table 2). All populations, except for LSY, showed a highly significant deviation from HWE (P < 0.01) (Table 3). However, from the outcome of deviation from HWE for each locus in each population (Additional file 1: Table S1), the above extreme results are likely to be the consequence of mixing analysis for multi-populations. In over 420 pair-wise comparisons for linkage disequilibrium (Ld) among 16 loci in all P. sinensis populations, there were seven significant comparisons in LA and LSL, six significant comparisons in LP, LSH and SJ, as well as five significant comparisons in LSY (Table 3). In general, no consistencies were found to be significant in pair-wise comparisons for Ld indicating that there was no linkage among these loci and their inclusion will not affect the results of genetic variability [12].

Table 3 Summary statistics of the genetic diversity of P. sinensis populations

Genetic diversity among populations

Data for all parameters of genetic diversity for the seven P. sinensis populations were shown in Table 3. The LSL population presented the highest NA, He and Ar values, while the LSH population exhibited highest NE and Ho values. In the LD, LP, LA, LSL and SJ populations, FIS coefficients varied from 0.093 to 0.322, suggesting significant deficiencies of heterozygotes with 95% confidence interval. In the other two populations, FIS values did not show significantly different from zero.

Estimated of effective population size (Ne) and the mean ratio of the number of alleles to the range in allele size (M ratios) of each population were also listed in Table 3. The Ne values for LP and LSH showed to be the highest (infinite), whereas the lowest Ne value was 106.1 as shown in LA. M ratios among all populations ranged from 0.826 to 0.908, which revealed that no population experienced reduction in effective size. Reduction in effective size only occurs when M < 0.68 [13]. Potential genetic bottleneck analysis performed using a Wilcoxon sign-rank test under TPM with 90% single-step mutations, showed that all populations exhibited normal L-shaped distribution and might not have experienced a bottleneck recently (P > 0.05).

Genetic divergence and distance between populations

The pairwise Wrights fixation index (FST) and Cavalli-Sforza and Edwards’ genetic distance (D) [14] values were shown in Table 4, revealing significant differences among all populations. The analysis of molecular variance (AMOVA) revealed that genetic variation within and among populations was 82.76 and 17.24%, respectively. In addition, the variation among populations was found to be significant (P < 0.01) (Table 5).

Table 4 Pairwise of FST (above diagonal) and Genetic distance (below diagonal) for P. sinensis populations
Table 5 Analysis of molecular variances (AMOVAs) among seven P. sinensis populations

Basically, the FST values among the seven populations reflected their geographic relationships. Firstly, there was a very great genetic differentiation between SJ (the only population from Huaihe Drainage Basin in Shandong Province) and the other six populations (all in Liaoning Province) (FST > 0.25). Secondly, LD (Related River Drainage Basin) and all populations from Liaohe River Drainage Basin (LP, LA, LSL, LSY and LSH) exhibited a great genetic differentiation between (0.15 < FST < 0.25). Finally, in Liaohe River Drainage Basin, there was a moderate genetic differentiation among LP, LA, LSL, LSY and LSH (0.05 < FST < 0.15), meanwhile, LSL and LP, LSL and LSY populations showed little genetic differentiation (FST < 0.05) [15] A comparison of FST values revealed significant differences among the populations (P < 0.01).

As analyzed by the Mantel test, the correlation between geographical distances and a pairwise comparison with genetic distances (D values) was significantly correlated (r = 0.803, p = 0.001). The D value between the populations displayed similar pattern as FST values. For example, most D values between SJ and the other six populations including SJ vs LA, SJ vs LSL, SJ vs LYS, and SJ vs LSH, were larger than 0.20, suggesting that they were genetically disparate populations [16]. In addition, the D values between LD and populations of Liaohe Drainage Basins were also higher than those between latter populations.

The UPGMA dendrogram constructed according to the D values was shown in Fig. 2. The seven populations formed three major clusters, LD, SJ and the other five populations. Among the five populations, LP and LA as well as LSL and LSY populations were more similar.

Fig. 2
figure 2

UPGMA dendrogram of genetic relationship among the seven P. sinensis populations using genetic distance over 1000 replicates. Numbers at the roots of the branches are bootstrap values

Gene flow between locations ranged from 0.608 (LA into LD) to 3.545 (SJ into LP). Most long-term gene flow between populations were symmetric (the 95% credible intervals overlapped in their pairwise comparisons), except those between SJ and LP, LSY and LSL, and SJ and LSY (Additional file 2: Table S2). As for cumulative gene flow, most populations received more migrants than they supplied, whereas SJ and LSL present the opposite results (Additional file 3: Table S3).

The logarithm probabilities Ln P (X/K) related with different numbers of genetic clusters K, calculated from structure analysis of 319 individuals of P. sinensis showed the highest value at K = 2, and followed by K = 5. As shown in Fig. 3, based on the value of K = 2, the individuals from LD and SJ populations were merged to some extent and were obviously different from the individuals of the other five populations. Based on the value of K = 5, individuals from LD and SJ populations were significantly different from each other, and the other five populations were similar to some extent, with LP and LA as well as LSL, LSY and LSH being the more similar.

Fig. 3
figure 3

Bayesian clustering of all seven P. sinensis populations using 16 microsatellite markers. a: The ΔK values (K = 2–7). b: Analysis of seven populations (n = 319 individuals, K = 2 and 5)

Discussion

Analysis of microsatellite polymorphisms

One of the main objectives of this study was to detect polymorphisms in transcriptome-derived microsatellite makers and to further understand the genetic diversity and structure of the wild populations from different locations. Compared with genomic derived microsatellites, transcriptome-derived microsatellites have several advantages including high efficiency, strong transferability, and correlation with potential genes [10]. Although transcriptome-derived microsatellites are predicted to be relatively less polymorphic than those derived from genomic DNA because they are more likely under stringent evolutionary constraint, some studies have given different results [17, 18]. In this study, 16 polymorphic microsatellites were identified, with the NA value ranging from 2 to 13 (mean = 5.8), which is similar with studies investigating transcriptome-derived microsatellite markers in other decapod [10]. Additionally, compared to those genomic derived microsatellites in related species, such as M. rosenbergii [7] and M. nipponense [8], these 16 microsatellites present lower polymorphism. Since there is no report about genetic markers for P. sinensis so far, these polymorphic microsatellite makers and further mining of transcriptomic data may helpful for future research of P. sinensis and its related species.

Genetic diversity within populations

The mean NE, Ho and He values in seven P. sinensis populations ranged from 1.441–1.867, 0.158–0.369 and 0.226–0.379, respectively, showing that the genetic diversity of the populations was lower than that of other species of the Palaemonidae family [7, 8]. Five populations including LD, LP, LA, LSL and SJ showed deficiencies of heterozygotes as indicated by significant FIS values. The LD and SJ populations showed lower genetic diversity as indicated by NE, Ho and He values, which may be a result of a small Ne value, inbreeding and a limited number of shrimp samples. Results in the current study indicated that the Ne values of the four populations LD, LA, LSY and SJ, were lower than the values suggested by Franklin et al. [19]. This may due to the poor swimming ability of P. sinensis. In addition, due to time and space constraints in sampling, a certain number of individuals may have been from a few number of spawning shrimps, which would affect genetic diversity through an increase in genetic identity [20].

Most of the populations analyzed in this study showed a departure from HWE. This result was similar to several other studies which microsatellites derived from transcribed sequence data significantly depart from HWE [21,22,23]. This could be due to selection on polymorphisms in untranslated gene regions where these microsatellites typically reside, or to non-neutral dynamics of the genes to which they are physically linked [23]. However, the bottleneck effect and the M ratios indicated that all seven P. sinensis populations did not experience a bottleneck effect or a recent decline in quantity. Therefore, in future breeding projects, the level of inbreeding, brood shrimp population size and genetic diversity must be considered since domestication probably leads to a reduction in genetic variation due to genetic drift, selection and inbreeding [24].

Relationships among different populations

In this present study, subjects are collected from three geographically isolated drainage basins. LD population belongs to Related River Drainage in East Liaoning Peninsula; the SJ population belongs to Huaihe River Drainage in north China and the other five populations (LP, LA, LSL, LSY and LSH) come from Liaohe River Basin in central Liaoning Province. Specifically, LSL and LSY come from a closed lake and a semi-closed reservoir respectively; LA and LSH belong to inland tributaries; LP locates close to the mouth of Liaohe River.

AMOVA analysis revealed that the genetic differentiation among populations of P. sinensis, which accounted for 17.24% of the total genetic variation, was much lower than that within populations. Pairwise D and FST values were consistent with these results. According to Thorp [16], the pairwise D values between most of the populations of P. sinensis indicated that they were closely related populations. Likewise, excluding SJ and LD populations, the pairwise FST values among the other five populations were low to moderate [15]. The slight and moderate differences of pairwise FST values among five populations indicated they might share the same ancestors. Even so, the significant correlation between geographical and genetic distances as well as the gene flow values indicated that five populations from Liaohe River Basin were separate due to the habitat fragmentation. The divergence between SJ and the other six populations as well as the divergence between LD and the other five populations were due to long-term geographic separation. These outcomes were consistent with the Bayesian analysis in genetic structure simulations, which also revealed that the SJ and LD populations were much different from the other populations.

Conclusions

In this study, 16 polymorphic transcriptome-derived microsatellites were screened and used to assess the genetic diversity and structure among wild P. sinensis populations in China. All the polymorphic microsatellite makers are believed useful for evaluating the extent of genetic diversity and population structure of P. sinensis. Compared to the other five populations, the LD and SJ populations exhibited lower genetic diversity due to the lower NE, Ho and He. All populations showed normal L-shaped distributions and may not have experienced a bottleneck or recent reductions in effective size. The AMOVA analysis revealed that genetic variation among populations was 17.24% and much lower than that within populations. D and FST values between any two populations indicated that the LD and SJ populations differed from the other five populations. The UPGMA tree and the STRUCTURE analysis also supported the result. Therefore, they needed to be protected against further declines in genetic diversity. Among the seven populations, LP, LA, LSL, LSY and LSH populations were all from Liaohe River Drainage with a relatively high genetic diversity, and hence can be considered as hot spots for in-situ conservation of P. sinensis as well as sources of desirable alleles for breeding values. In future, further development of transcriptome-derived microsatellite markers is necessary for more detailed investigation on the genetic variation, genetic structure, and molecular markers-assisted selection (MAS) of P. sinensis. Overall, this study provided a theoretical basis for the protection, rational use and genetic breeding of germplasm resources.

Methods

Sampling and DNA extraction

Totaling 319 individual shrimps from seven wild populations of P. sinensis were collected in 2016 from Sha River in Liaoning Dalian (LD, n = 48), Shuangtaizi River in Liaoning Panjin (LP, n = 43), Yangliu River in Liaoning Anshan (LA, n = 48), Longwei Lake in Liaoning Shenyang (LSL, n = 48), Yangshi reservoir in Liaoning Shenyang (LSY, n = 48), Liao River in Liaoning Shenyang Huangjia (LSH, n = 48) and Dushan Lake in Shandong Jining (SJ, n = 36) (Fig. 1, Table 1). Genomic DNA was extracted from the muscles of each shrimp using a TIAnamp Marine Animals DNA Kit (TIANGEN) according to the manufacturer’s protocol.

Microsatellite selection and genotyping

More than 50 microsatellite loci were selected from the data of P. sinensis transcriptome (GenBank No. SRR5759507) [5], and all new primers were designed using Primer Premier 3.0 (http://bioinfo.ut.ee/primer3-0.4.0/). Primers were examined using varying PCR conditions and PCR amplified products were evaluated by agarose gels. PCR reactions were performed by ABI 2720 thermocycler (Applied Biosystems, USA). DNA samples were subsequently amplified for the polymorphic loci in single PCR reactions. PCR reactions were 15 μL, containing 50-100 ng template, 10 × buffer 1.5 μL, Mg2+ (25 mmol·L− 1) 1.5 μL, dNTPs (each of 10 mmol·L− 1) 0.25 μL, each of forward primer and reverse primer (10 μmol·L− 1) 0.15 μL, Taq polymerase 1 U, and ddH2O. Denaturation for 3 min at 94 °C was followed by 35 cycles made up of 25 s at 94 °C, 25 s at the 54 °C and 60 s at 72 °C. The final step was a prolonged extension of 10 min at 72 °C, followed by 4 °C hold. All individual genotypes were scored after the PCR products were resolved on Applied Biosystems 3730XL Genetic Analyzer (Applied Biosystems, USA) and the product size was analyzed by GeneMarker version 2.2.0. (Applied Biosystems, USA).

Statistical analysis

Sixteen polymorphic loci were used to detect genetic variation among P. sinensis populations (Table 2). The number of alleles (NA), the number of effective alleles (NE), observed heterozygosity (Ho) and expected heterozygosity (He) of each locus of each population was calculated using POPGENE 1.32 [25]. Null allele frequency (r) in each locus was estimated using software FreeNA [26] in which loci with estimated frequencies of null alleles above 0.2 were considered as potentially problematic for calculations. Meanwhile, the allele richness (Ar) and Wright’s FIS values with 95% confidence intervals were calculated according to Weir and Cockerham using FSTAT version 2.9.3.2 software [27]. Tests for Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (Ld) were conducted using Arlequin 3.5.2.2 [28] and a Markov chain of 1,000,000 and 100,000 Dememorisation Steps.

The estimates of effective population size (Ne) for each population was calculated using the gametic disequilibrium method implemented in LDNe 1.31 [29] based on the lowest allele frequency of 0.02 and confidence intervals estimated with the parametric method (which were highly similar to those estimated by the jackknife method). The mean ratio of the number of alleles to the range in allele size (M ratio) was used to assess recent changes in Ne value using Arlequin 3.5.2.2 [28]. Recent bottlenecks were performed using Bottleneck version 1.2.02 [30], under a two-phase model (TPM) with 90% single-step mutation. These methods test for departures from mutation-drift equilibrium based on heterozygosity excess or deficiencies. A Wilcoxon signed-rank test was used to determine whether a statistically significant number of loci displayed heterozygote excess compared to expectations based on the observed number of alleles.

Pairwise Wrights fixation index (FST) and analysis of molecular variances (AMOVAs) were calculated using Arlequin 3.5.2.2 [28]. Cavalli-Sforza and Edwards’ genetic distance (D) [14], computed using the INA correction method described in Chapuis and Estoup, was also calculated using FreeNA [26], and then constructed using the dendrogram with genetic distance based on UPGMA cluster through POPTREE2 [31]. The significance was tested based on 1000 bootstraps. A Mantel test was performed to estimate a correlation between the matrices of genetic and geographical distances using Arlequin 3.5.2.2 [28] (10,000 permutations).

Pairwise gene flow among all seven populations was calculated by using software Migrate-n 4.4.3 [32, 33] through likelihood approach. The program was run with the following setting, 10 long chains of 105, burn-in 1000, heating scheme (ten temperatures: 1.00, 1.12, 1.29, 1.50, 1.80, 2.25, 3.00, 4.50, 9.00, 1,000,000).

Genetic structures among populations analysis was performed via STRUCTURE v2.3.3 [34] using Bayesian methods. Parameters settings were assumed by an admixture model, with a burn-in of 50,000, with 100,000 Markov chain-Monte Carlo (MCMC) repetitions and 10 iterations per K (K = 2–7). The ΔK value was calculated by online software STRUCTURE HARVESTER [35] based on the rate of change in the log probability of data between successive K [36]. Plots of the clustering results were obtained using DISTRUCT [37].