Microsatellite development from genome skimming and transcriptome sequencing: comparison of strategies and lessons from frog species

Xia, Yun; Luo, Wei; Yuan, Siqi; Zheng, Yuchi; Zeng, Xiaomao

doi:10.1186/s12864-018-5329-y

Microsatellite development from genome skimming and transcriptome sequencing: comparison of strategies and lessons from frog species

Research article
Open access
Published: 07 December 2018

Volume 19, article number 886, (2018)
Cite this article

Download PDF

You have full access to this open access article

BMC Genomics Aims and scope Submit manuscript

Microsatellite development from genome skimming and transcriptome sequencing: comparison of strategies and lessons from frog species

Download PDF

Yun Xia¹,
Wei Luo^1,2,
Siqi Yuan^1,3,
Yuchi Zheng¹ &
…
Xiaomao Zeng ORCID: orcid.org/0000-0002-1952-2170¹

3193 Accesses
15 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

Background

Even though microsatellite loci frequently have been isolated using recently developed next-generation sequencing (NGS) techniques, this task is still difficult because of the subsequent polymorphism screening requires a substantial amount of time. Selecting appropriate polymorphic microsatellites is a critical issue for ecological and evolutionary studies. However, the extent to which assembly strategy, read length, sequencing depth, and library layout produce a measurable effect on microsatellite marker development remains unclear. Here, we use six frog species for genome skimming and two frog species for transcriptome sequencing to develop microsatellite markers, and investigate the effect of different isolation strategies on the yield of microsatellites.

Results

The results revealed that the number of isolated microsatellites increases with increased data quantity and read length. Assembly strategy could influence the yield and the polymorphism of microsatellite development. Larger k-mer sizes produced fewer total number of microsatellite loci, but these loci had a longer repeat length, suggesting greater polymorphism. However, the proportion of each type of nucleotide repeats was not affected; dinucleotide repeats were always the dominant type. Finally, the transcriptomic microsatellites displayed lower levels of polymorphisms and were less abundant than genomic microsatellites, but more likely to be functionally linked loci.

Conclusions

These observations provide deep insight into the evolution and distribution of microsatellites and how different isolation strategies affect microsatellite development using NGS.

Microsatellite discovery in an insular amphibian (Grandisonia alternans) with comments on cross-species utility and the accuracy of locus identification from unassembled Illumina data

Article Open access 29 July 2016

A rapid and cost-effective approach for the development of polymorphic microsatellites in non-model species using paired-end RAD sequencing

Article 20 June 2017

A Galaxy-based bioinformatics pipeline for optimised, streamlined microsatellite development from Illumina next-generation sequencing data

Article Open access 02 August 2016

Background

Microsatellites, also known as simple sequence repeats (SSRs), are highly polymorphic and generally neutral molecular markers that are still extensively used as genetic markers in evolution and ecology [1,2,3]. However, the traditional protocols to produce SSRs using microsatellite-enriched libraries, cloning and Sanger sequencing [4] are time consuming and resource intensive. Especially for nonmodel species, microsatellites often have been developed de novo using this process, due to a low rate of cross-species microsatellite amplification success [5, 6]. To overcome these obstacles, next-generation sequencing (NGS) has achieved faster and higher throughput nucleotide sequence data, which facilitate the identification and characterization of microsatellites in nonmodel species. Indeed, hundreds of microsatellite loci were isolated without any screening steps using this high-throughput sequencing technology [7, 8]. Microsatellite loci have been isolated using many NGS platforms, such as Roche 454 [9,10,11], Ion Torrent PGM [12,13,14], Illumina [15, 16], and PacBio [17]. The popularity of using NGS not only varies by platforms but also by the sources of the sequences. Both genome skimming and transcriptome sequencing have been used to mine microsatellite markers [18,19,20].

Although the recent development of NGS techniques has facilitated the de novo development of SSRs, this task is still quite expensive, and the subsequent polymorphism screening requires a substantial amount of time [21, 22]. Obtaining good polymorphic loci usually requires further polymerase chain reaction (PCR) optimization. The mutational dynamics of microsatellites have diversely evolved in different species and in variant repeat types of SSRs [1, 23], and they are important in understanding how to quickly obtain SSRs with high successful amplification rate and non-monomorphic loci for microsatellite isolation. One strategy can efficiently differentiate polymorphic loci from monomorphic loci, by screening multiple genotypes and selecting only those loci that already show differences in repeat length within the set of sequence reads for marker development [21, 24, 25]. Yet, this strategy requires the pooling of multiple genotypes for NGS library construction. Actually, most studies do not make use of such mixed multiple genotypes for library layout [12, 15,16,17]. Instead, microsatellite data may already or soon be available as a byproduct of NGS. For such cases, we can also use the properties of microsatellites to determine the selection range. A well-known relationship exists between microsatellite length and polymorphism level, due to mutation rate increases with an increasing number of repeat units in an appropriate range [1, 26]. Microsatellites with longer repeat lengths are likely to be highly variable. Intuitively, this seems understandable; more repeat units generally show higher mutation rates because replication slippage may increase in proportion to the number of repeats [1, 27]. In addition, a negative correlation between microsatellite repeat length and substitution rate in nearby flanking sequence was found in mammalian genomes [28]. A low substitution rate in nearby flanking sequences would increase the rate of successful amplification. Getting a high number of repeats may help improve the polymorphism of microsatellites and their allelic richness, especially important for species with low genetic variability.

Researchers often tend to select the most polymorphic markers for subsequent studies and to discard limited polymorphic or monomorphic loci. However, empirical studies have shown that choosing the most polymorphic markers leads to an ascertainment bias and generally overestimates genetic diversity [26, 29]. Furthermore, researchers also tend to select tetranucleotide repeat markers, because they are easier to distinguish, as they cause fewer stutter bands than dinucleotides when screening the polymorphism [1, 23]. However, mutation rates of tetranucleotide loci are varied among species. Some studies reported that tetranucleotide loci have higher mutation rates [13, 30], but not supported in Drosophila [31, 32]. Hence, researchers should primarily take this information into account when selecting the appropriate microsatellite loci for ecological and evolutionary studies [29, 33].

The polymorphism of microsatellites is not only affected by the properties of microsatellite repeat units, but also related to the sources of omics data and the assembly of NGS [19, 34]. Genome skimming is by far one of the simplest methodologies for NGS, involving random sampling of a small percentage of total genomic DNA, which can obtain sequences containing plenty of microsatellites [35]. Transcriptome sequencing, or RNA-seq, is also used to isolate SSRs. Microsatellites present in protein coding regions, however, have low allele polymorphism and short repeat lengths, because SSRs could lead to gain or loss of gene function via frame shift mutations [36]. SSR expansions or contractions within genes can eventually lead to phenotypic changes. Yet, microsatellite size homoplasy, characterizing DNA fragments identical in state but not by descent due to convergent evolution, would be more likely occur in genomic SSRs, due to higher mutation rate of genomic SSRs compared to transcriptomic SSRs [37, 38]. Additionally, the assembler chosen for NGS would significantly affect the quality of genome or transcriptome reconstructions, which require special attention during downstream analysis [39, 40]. The microsatellites isolated from NGS could differ depending on sequencing method, read length, library layout, sequencing depth and genome composition. The genome size could also affect the frequency of microsatellites [23, 41]. However, such comprehensive evaluations are only fragmentarily reported in a limited number of species [11, 17]. It remains unclear the extent to which assembly strategy imposes a measurable effect on microsatellite marker development. A deeper understanding of the isolation of microsatellites from NGS is therefore needed, not only to understand evolutionary and mutational properties of microsatellites but also to appropriately use microsatellite markers in ecological and evolutionary studies.

Here, we chose six frog species for genome skimming in two runs of Illumina MiSeq and two frog species for transcriptome sequencing by using Illumina HiSeq to isolate microsatellites for these species. The efficiency of microsatellite development from different sequencing methods and genome composition was compared. Additionally, we investigated how the quantity or sequencing depth of NGS data, read length, and assembly strategy affect the number and polymorphism of the isolated microsatellites.

Methods

Six frog species were collected for genome skimming by Illumina MiSeq sequencing: Amolops chunganensis, A. mantzorum, Hyla chinensis, Hylarana latouchii, Quasipaa boulengeri, and Rhacophorus chenfui. A single individual for each species was chosen for genomic sequencing. A single individual from two species, A. mantzorum and Q. boulengeri, were also used for transcriptome sequencing. All the samples were collected in the field of China, and the location and other information for each sample is shown in Additional file 1: Table S1. The frogs in this study were anesthetized and euthanized in 0.2% MS222 (ethyl3-aminobenzoate methanesulfonate) to collect target tissues. Genomic DNA was extracted from muscle tissue using SDS-proteinase K/phenol-chloroform protocol [42]. RNA was extracted individually from six types of tissues (cerebrum, heart, liver, skeletal muscle, skin, and testicle) using TRIzol® reagent according to the manufacturer’s protocol (Life Technologies, Shanghai, China) and mixed with approximately the same quantity. All work with animals was conducted according to relevant national and international guidelines on the Protection of Wildlife (Chengdu Institute of Biology issued permit number CIB#2014–36 and CIB#2014–110). All animal care and experimental procedures were approved by the Animal Care and Use Committee of Chengdu Institute of Biology, Chinese Academy of Sciences (Permit Number: CIB-20121220A). All voucher specimens were deposited in the Herpetological Museum of Chengdu Institute of Biology, Chinese Academy of Sciences.

Genomic DNA and RNA were shipped to Novogene Bioinformatics Technology (Beijing, China) for library construction and sequencing with Illumina MiSeq (PE300) and Illumina HiSeq 2000 (PE125), respectively. Briefly, 3–32 μg of DNA or RNA for each sample is used for library preparation. The concentrations and volumes of DNA and RNA are shown in Additional file 1: Table S1. The paired-end library for genome skimming was prepared using the TruSeq DNA PCR-Free Sample Preparation Kit (Illumina, USA) (insert size approximately 500 bp), following the protocols for Illumina DNA sample preparation. For transcriptome sequencing, libraries were generated using TruSeq SR Cluster Kit v3-cBot-HS (Illumina, USA) following the manufacturer’s recommendations. These libraries were pooled for sequencing on the Illumina platforms (MiSeq for genomic libraries, HiSeq 2000 for transcriptomic libraries) at Novogene Bioinformatics Technology (Beijing, China).

Contig assembly and microsatellite detection

The raw sequence reads were inspected with FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and adaptors were trimmed using Trimmomatic [43]. In this step, all raw reads with more than 10% missequenced nucleotides (poly-N) were discarded, and reads with more than 50% of the bases having a Q-value ≤20 were filtered out. De novo assembly of the cleaned reads for genome sequencing for each dataset was performed using SOAPdenovo2 [44] and Trinity [45]. Read preprocessing and assembly parameters followed the individual programs’ guidelines. For SOAPdenovo2, we used 6 k-mer sizes between 55 and 80, with a step size of 5 for the manual specification of assembly parameter values. For each assembly, we monitored the change in total number of contigs and N50 size over the assessed parameter range. For transcriptome sequencing, Trinity and SOAPdenovo2 (8 k-mer sizes between 25 and 60, with a step size of 5) were used for the assembly of the cleaned reads.

The identification of microsatellite loci from the assembly data was performed with QDD3 v3.1.2 [46]. We searched for di-, tri-, tetra-, penta- and hexanucleotide pure repeated units, with at least 7 tandem repeats for dinucleotides, and 6 repeats for the others. To eliminate redundancy, the similarity of sequences containing microsatellite regions was detected by all-against-all BLAST searching. Primer pairs were designed using Primer 3.0 software [47] to meet the following restrictions: the target amplification size range was 120–500 bp, the primer annealing temperature was restricted to 55–65 °C and the primer length was 20–27 bp. In order to avoid the possible bias from software, MISA [48] was also used to identify microsatellite markers, following the default parameter values.

Length and quantity simulation

We conducted a simulation experiment to investigate the effects that different read length and quantity distributions may have on the discovery of microsatellites and on their primer design. For the read length simulation, we selected the genome sequencing data from Amolops chunganensis, and chose five size distributions of reads (100, 125, 150, 175 and 200) for detection. The reduced dataset for each size were performed in ART [49]. The genomic datasets of Amolops mantzorum and Quasipaa boulengeri were used for data quantity simulation. Five datasets (full, 1/2, 1/4, 1/8, and 1/16 of the total reads) were selected randomly for the quantity simulation. Both the length and the quantity simulation datasets were assembled using SOAPdenovo2 and Trinity. Microsatellite discovery and primer design were performed using QDD3 v3.1.2 and Primers 3.0 software.

PCR amplification and genotyping

To test the polymorphism and stability of the detected SSRs, we chose Amolops mantzorum and Quasipaa boulengeri for PCR amplification. The sample information for each species is provided in Additional file 2: Table S2. We synthesized primers for the loci with a target product size of approximately 120–400 bp for amplification. A total of 100 loci for A. mantzorum, and 62 loci for Q. boulengeri were used for amplification. Eight samples for each species were used for preliminary testing of the primer amplification success rate. Then, electrophoresis was performed with an 8% nondenaturing polyacrylamide gel to evaluate polymorphism. The loci, which could not be successfully amplified, and gave multiple bands, faint bands or bands of unexpected sizes, were discarded from subsequent analyses. Eventually, the forward primers of those loci that showed variation were labeled with one of the fluorescent dyes (FAM, HEX or TAMRA) for a further polymorphism test. For the polymorphism test, we used 64 individuals of A. mantzorum and 96 individuals of Q. boulengeri to analyses. PCR amplifications were performed in a volume of 15 μl containing 7.5 μl 2× EasyTaq PCR SuperMix (TransGen Biotech, Beijing, China), 0.5 μl each primer (10 pm/μL), 5.5 μl ddH₂O and approximately 1 μl diluted genomic DNA (50–200 ng/μl). Reaction conditions included an initial 5 min denaturing step at 95 °C, followed by 32 cycles of 95 °C for 30 s, 55–65 °C annealing for 30 s, and 72 °C extension for 30 s, with a final extension of 7 min at 72 °C. The primer sequences, specific annealing temperatures and other information for all the primers used, are shown in Additional file 3: Table S3. The PCR products were run on an ABI-3730xl sequencer and genotyped using GENEMAPPER 4.0 (Applied Biosystems).

The null alleles and genotyping errors for each locus were checked using Microchecker v2.2.3 [50]. The number of alleles, allelic size range, observed and expected heterozygosity were determined using GENEPOP v4.2 [51]. Tests for deviation from Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were also calculated using GENEPOP v4.2.

Results

Sequence data from Illumina NGS

Genomic sequencing was performed using two runs of Illumina MiSeq, and each run included three frog species (Additional file 1: Table S1). Each run of Illumina MiSeq (PE300) had 22 million raw PE reads. The statistics for sequencing data from each species are shown in Table 1. Each species had almost 4 Gb raw data. For transcriptome sequencing, a total of 125.53 million raw PE reads were obtained from Illumina HiSeq (5.04 Gb and 7.32 Gb, for Amolops mantzorum and Quasipaa boulengeri, respectively). The raw reads have been deposited in the NCBI SRA database (Bioproject accession number: PRJNA329172, PRJNA347498 and PRJNA374468). After removal of adaptors and low-quality reads, the clean reads were used for the subsequent assembly. The data utilization ratio for Miseq (43.98–85.09%) was much smaller than that for Hiseq (96.27–97.37%) (Additional file 1: Table S1).

Table 1 Overview of the sequencing data from Illumina next-generation sequencing for each species of frog

Full size table

NGS assembly and microsatellite detection

The total number of contigs and the average length of contigs are shown in Fig. 1. For the SOAPdenovo2 assembly for each species, the smaller k-mer sizes produced more contigs in total, but these contigs had a shorter average length. All the k-mer sizes used for SOAPdenovo2 produced over 2 million contigs for each species, but Trinity produced only a quarter as many contigs. Although fewer contigs were assembled from Trinity than SOAPdenovo2, the Trinity assembler collected contigs with significantly longer average lengths (Fig. 1a, b).

For microsatellite loci detection, each assembly strategy also revealed impressive differences based on QDD results. The smaller k-mer sizes led to the detection of more microsatellite loci with flanking sequences that could be used for primer design (Fig. 1c). Nevertheless, the smaller k-mer size led to the detection of microsatellite loci with lower maximum and mean repeat region length (RRL) (Fig. 1d, e). In contrast, Trinity identified the fewest microsatellite loci and the lowest mean RRL, but with the longest maximum RRL, except in one species (Hyla chinensis; Fig. 1). All the assembly strategies produced sufficient numbers of microsatellite loci for further analysis, even the lowest number produced by Trinity was greater than 1000. For the entire dataset, dinucleotide repeats predominate, followed by tri- and tetranucleotide repeats, and penta- and hexanucleotides repeats are least abundant. From the results of MISA, the number of SSRs decreased with increasing k-mer sizes, while the maximum RRL increased with increasing k-mer sizes (Additional file 4: Table S4). Take Amolops chunganensis for example, the largest number of SSRs is when K = 60 (196,837), and the smallest number of SSRs is when K = 80 (179885). For the maximum RRL, the smallest is 62 (K = 60), and the largest is 84 (K = 80). The tendency was not affected by the adopted software. Notably, the numbers of identified SSRs were different between the two softwares, since MISA includes all the number of identified SSRs, while QDD only counts the number of identified SSRs with primers.

For RNA-Seq, Trinity is better than SOAPdenovo to obtain average length of contigs and total number of microsatellite loci at any k-mer size (Additional file 5: Table S5). But the maximum and mean RRL in SOAPdenovo were larger than those in Trinity. A total of 121,416 and 278,803 contigs were obtained from Trinity program for Amolops mantzorum and Quasipaa boulengeri, respectively (Table 2). The total number of contigs for RNA-Seq was much smaller than that obtained for genomic sequencing, but the average length of contigs was longer with RNA-Seq for both species. Both the total number of microsatellite loci and the maximum RRL in RNA-Seq were obviously lower than that obtained from genomic sequencing (p < 0.01). Dinucleotide repeats were also the most abundant category, followed by tri- and tetranucleotide repeats, similar to the results from genome sequencing.

Table 2 Assembly and microsatellite loci detection statistics for transcriptome sequencing using Trinity

Full size table

Data quantity and microsatellite discovery

Amolops mantzorum and Quasipaa boulengeri were chosen for data quantity simulation. The statistics of SSR discovery for each dataset are shown in Table 3 and Fig. 2. The total number of identified microsatellite loci was reduced by the decline in the quantity of data. As expected, the number of microsatellite loci was associated with the size of dataset. For instance, a twofold increase in data quantity, such as from half to full in Amolops mantzorum, resulted in a nearly twofold increase in the number of microsatellite loci detected. Furthermore, the number of SSRs for each type of nucleotide repeat also followed this tendency. In other words, the smaller the quantity of NGS data, the fewer the microsatellite loci identified. However, the results showed very little difference in maximum and mean RRL for each dataset (Table 3). The relation of the k-mer sizes to the data quantity reflected that the total number of contigs and microsatellites decreased with increasing k-mer sizes (Fig. 2a-d). There are overlaps of contigs and microsatellites when different k-mers with varied data quantity.

Table 3 Effects of data quantity on microsatellite discovery based on genomic data (k-mer size = 65)

Full size table

Read length simulation

To test how the length of the NGS data affects the final quality and the number of SSR loci, we selected the genome sequencing data from Amolops chunganensis for length simulation. The results showed a positive relationship between read length and effectiveness of microsatellite isolation (Fig. 3). The shorter the read length, the fewer microsatellite loci were identified. Interestingly, the number of microsatellite loci identified was nearly equal to the read lengths of 125 and 150, implying that for short sequence read assembly, when read length was not increased dramatically, the read length could not directly influence the number of microsatellite loci isolated. The proportion of each type of nucleotide repeat was not affected by read length (Fig. 3b). With increasing k-mer sizes, the total number of contigs and microsatellites would decreases (Fig. 2 e, f).

Microsatellite development and polymorphism

The characteristics of these newly discovered microsatellites of Amolops mantzorum and Quasipaa boulengeri are summarized in Additional file 3: Table S3. For A. mantzorum, 30 loci could be easily genotyped and displayed polymorphism in 64 individuals from western Sichuan province, China (Additional file 2: Table S2). This population is possibly in Hardy Weinberg equilibrium with loci MT88, MT10, MT84, MT39, MT1, showing signs of a null allele. The number of alleles per locus ranged from 2 to 21 with an average of 9. The observed and expected heterozygosity per locus ranged from 0.078 to 0.984 and from 0.076 to 0.918, respectively (Additional file 3: Table S3). For Q. boulengeri, 29 loci showed a clear band of the expected size and displayed polymorphisms in 96 individuals from two populations (Additional file 2: Table S2). The loci QB02, QB18, QB34, QB16, showed signs of a null allele. The number of alleles ranged from 2 to 23, with an average of 7.23. The observed and expected heterozygosity ranged from 0.300 to 0.922 and from 0.237 to 0.937, respectively. A few markers in both species were found to deviate from Hardy-Weinberg equilibrium, suggesting an association with the presence of null alleles (Additional file 2: Table S2).

Combining the genomic library microsatellites and transcriptome derived microsatellites from previous studies for Amolops mantzorum and Quasipaa boulengeri [52,53,54], we compared the properties of microsatellites from different sources. In total, there were 30 genomic and 15 transcriptomic loci for A. mantzorum; and 40 genomic and 32 transcriptomic loci for Q. boulengeri (Additional file 3: Table S3). Both the number of alleles (Na) and the observed heterozygosity (Ho) were affected by the sources of omics data (Fig. 4). The microsatellites derived from transcriptome have significantly lower Na and Ho values than those from genomic loci in Quasipaa boulengeri (independent-samples T test, p < 0.01), but not significantly lower in Amolops mantzorum (p > 0.05). In addition, transcriptomic microsatellites possessed fewer repeats and a narrower size distribution than genomic loci (Fig. 5; Additional file 3: Table S3). The number of repeats positively influenced the Na for genomic data (for A. mantzorum, Pearson’s correlation coefficient r = 0.61, two-tailed p < 0.01; for Q. boulengeri, Pearson’s correlation coefficient r = 0.481, two-tailed p < 0.01). However, the number of repeats and Na were not significantly correlated in transcriptomic SSRs (for A. mantzorum, Pearson’s correlation coefficient r = 0.329, p = 0.231; for Q. boulengeri, Pearson’s correlation coefficient r = 0.333, two-tailed p = 0.06).

Discussion

In the era of NGS, SSRs are still widely used in ecology and evolution [2], because they have their own advantages for population genetics compared to other molecular markers (e.g., single nucleotide polymorphisms, SNPs) [55, 56]. For example, SSRs can be successfully amplified from low quality of DNA (e.g. ancient DNA or museum specimens); the success rate of cross-amplification for SSRs in closely related species is typically higher than that for SNPs [55, 57]. Microsatellites have remained the markers of choice for various population and conservation genetics studies for parentage analysis and for haplotyping in non-model organisms [58, 59]. Specifically, they can provide new insights into genome and chromosome evolution [60, 61]. Their applications in genetics are extensive due to their ceaseless mutation, widespread length variations, co-dominance and reproducibility [1, 23]. Using conventional methods, obtaining nucleotide sequence data for microsatellites and their flanking regions in a target species requires cost and time, particularly for screening, cloning and sequencing microsatellite regions. Utilization of NGS to identify microsatellites is becoming more accessible and feasible, but it still lacks a comprehensive evaluation of how read length, library layout, sequencing depth and assembly strategy influence microsatellite isolation. In this study, we find large variations in the quality of results depending on the different datasets and assemblers used.

Data quantity, read length, assembly and SSR abundance

Microsatellites are present in the genome of most taxa and are abundant in the genome. Although the abundance of microsatellites differs between taxa and even closely related species, millions of microsatellite loci have been detected in human and mouse genomes [1]. The results from our data quantity simulation revealed that the number of isolated microsatellites increases following an increase in the data quantity when the data quantity is smaller than the entire genome (Table 3, Fig. 2), which can be attributed to the abundance of microsatellites in frog species. In addition, we have isolated these microsatellites randomly or almost randomly from the genome, because there are no dramatic differences in nucleotide repeats and maximum and mean RRL between each dataset. At least two explanations might account for the observation of such a distribution. Firstly, microsatellites may be randomly distributed over the euchromatic genome [62]. However, empirical studies using genome sequencing and fluorescence in situ hybridization (FISH) have shown uneven and nonrandom SSR distribution within and between genomes, where microsatellite density is higher near the centromeres and telomeres [60, 63] . Secondly, the random construction of library layout with genome skimming also resulted in such random isolation. Genome skimming is random sampling of a small percentage of total genomic DNA, which can randomly accept sequences containing microsatellites [35].

The yield of microsatellites through genome skimming could be affected not only by sequencing depth, but also by genome size (C-value). Generally, the frequency of microsatellites correlates positively with genome size [23, 41]. For these six frogs, unfortunately, the genome sizes are unclear, except Quasipaa boulengeri [64]. The C-value of Q. boulengeri is 3.06–3.11 [65, 66]. The coverage for Q. boulengeri is approximately 1.6X (4.97 Gigabase). For other species, the coverage could be estimated to 1–2X according to the C-value of the closely related species for these frogs (C-value: 2–7). Due to low coverage sequencing for genome skimming, the data quantity simulation shows the yield of microsatellites for each species increased with increasing sequencing depth (Table 3). However, the highest yield of microsatellites among these species was Q. boulengeri, which didn’t have the largest sequencing depth (Fig. 1; Table 1). The number of isolated microsatellites don’t increase following an increase in the data quantity between species. The genome size could affect the frequency of microsatellites. Due to lack of C-value of these species, this assumption need to be further verified. Notably, genome size has a negative effect on amplification success rate [67]. Species with a large genome size, such as salamanders, have a low amplification success rate for microsatellite development [68], due to the relative amounts of target DNA and of available primers for amplifying target DNA could be reduced.

Read length is considered to be one of the most important factors for microsatellite development using NGS [12, 17]. The shorter read lengths can influence SSR development by: 1) reducing the likelihood of discovering longer repeat length microsatellites and 2) reducing the potential success of designing primers [12, 17]. The read length for Illumina is generally shorter than that for the Ion Torrent and the GS-FLX (454) platforms. MiSeq has the longest read lengths (2 × 300 bp) of the Illumina platforms, but its output consists of relatively few reads. HiSeq has shorter reads than MiSeq; the new HiSeq v2 reagents allow 2 × 250 bp in rapid run mode [2]. Despite the shortcoming of read length using Illumina platforms, microsatellite isolation can largely overcome this shortcoming with upgrades in the assembly strategy. After short read assembly, many microsatellite markers have been developed (Fig. 1, Additional file 3: Table S3). Our results (Figs. 2, 3) are in accordance with prior evidence that the shorter the read length, the fewer the microsatellite loci identified [17]. This indicates that microsatellite development from short sequence read assembly would also be influenced by both the length of the flanking regions and the repeat length of microsatellites.

Assembly strategy can also influence the processes involved in microsatellite development. Generally, de Bruijn graph (DBG) based assemblers, such as SOAPdenovo and Trinity, are suited for short read lengths and high-coverage data with less run time and memory complexity [69]. However, the choice of an appropriate k-mer size is one of the major difficulties in using DBG assemblers [39]. Our results show that the quality of both genome reconstructions and microsatellite development depends essentially on k parameter values (Figs.1, 2). The difference of the yields of microsatellites can be attributed to the low coverage of genome skimming and the properties of DBG. When the coverage is low, using a short k-mer for DBG assemblers can be critical for successful assembly, but has a limited ability to resolve repeat sequences, such as microsatellite. In contrast, a longer k-mer size can better resolve repeats but may result in fragmented assembly due to insufficient coverage [70]. Therefore, a longer k-mer size led to the detection of fewer total microsatellite loci and longer repeat length, which have a higher polymorphism level for downstream analysis [26, 27] . For Trinity, because of eliminating both low-complexity and singleton k-mers as initial seeds for contig extensions in Inchworm’s step, it causes the low number of contigs and microsatellites [71]. The longer repeat lengths are not an error of assembly, because they have the same amplification success as shorter repeats (Additional file 2: Table S2, Additional file 3: S3). For transcriptome sequencing, Trinity is better than SOAPdenovo at any k-mer size with regard to average length of contigs and total number of microsatellite loci (Additional file 5: Table S5). It accounts for the high coverage of transcriptome sequencing and algorithmic advantages of Trinity. This observation is consistent with empirical studies that Trinity is suited for de novo transcriptome assembly [72].

Microsatellite polymorphism

NGS may provide lots of potential microsatellites, but selecting appropriate polymorphic microsatellites is still a critical issue for downstream analysis. Initial polymorphism detection can be performed by pooling multiple individuals when constructing sequencing libraries to investigate whether candidate loci are polymorphic [21, 73]. However, such prior mixing of samples for library construction has not been used in many NGS studies [12, 15,16,17]. Whatever the methods used, finding polymorphic microsatellite loci is the critical issue. Fortunately, these microsatellites provide repeat type and length, which is indirect information to infer polymorphism. The mutation process seems to be heterogeneous with respect to loci, repeat types and organisms, but it is widely accepted that microsatellites with a greater number of repeats are more mutable [1, 23, 74]. Alleles with a higher number of repeats often mutate at a higher rate, and the relationship between length and rate has been reported to be exponential rather than linear [23]. The results from genomic SSRs revealed that the number of repeats or repeat lengths positively influenced the Na (p < 0.01) (Fig. 4), supporting this hypothesis. But these correlations are very weakly for transcriptomic SSRs (p > 0.05), and it could be attributed to the functional constraint on transcriptomic SSRs [36]. In addition, those microsatellite loci with long repeat lengths have another advantage in that their nearby flanking sequences have a low substitution rate, which would increase the rate of successful amplification [28]. Of course, a higher mutation rate SSRs with longer repeat length is not necessarily better. Due to ascertainment bias and overestimation of genetic diversity caused by the most polymorphic markers [26, 29], the random selection of SSRs should depend upon the subsequent downstream analyses.

Transcriptome versus genome microsatellite markers

Both transcriptomic and genomic sequencing identified many potential microsatellites, but the number and properties of these microsatellites were quite different. Transcriptomic SSRs are expected to display lower levels of polymorphism than genomic SSRs, as they are associated with conserved regions of the genome [19, 75]. In our study, the polymorphic screening in Quasipaa boulengeri showed that both Na and Ho were lower in transcriptomic than genomic microsatellites (Figs. 4, 5). This low polymorphism could be explained by transcriptomic SSRs are often non-neutral loci, which might bias the estimates of population divergence under the assumption of a neutral model of drift, mutation and migration [19, 34]. In addition, a few study found that the level of SSR variability is likely to differ between coding and untranslated regions [48, 76]. That could be the reason why microsatellites derived from transcriptome were different between Quasipaa boulengeri and Amolops mantzorum. But it needs to be further investigated. The expansions or contractions of microsatellites within genes can eventually lead to phenotypic changes or loss of function [36]. Though low polymorphism for SSRs from transcriptomic sequencing, some studies would still chose this method to develop functionally linked loci. Because of the functional constraint on transcriptome SSRs, it is more easily to find functional linked loci from transcriptome than genome. The transcriptomic sequence is more conserved than noncoding genome sequence, and more likely to Blast on its sibling species with reference sequences of genome or transcriptome. It is very useful for non-model species to study loci location and function. For Quasipaa boulengeri, we employed both genomic and transcriptomic SSRs to isolate sex-linked microsatellite markers. And one locus from transcriptome sequencing data was found [77]. Overall, these results forge a framework for our deep understanding of the evolution and distribution of microsatellites and how different isolation strategies affect microsatellite development using NGS.

Conclusions

In the era of NGS, microsatellite data may already or soon be available as a byproduct of NGS, and it is still widely used in ecology and evolution. By using genome skimming and transcriptomic sequencing of non-model frog species, we investigated how the assembly strategy, read length, sequencing depth, and library layout affect the number and polymorphism of the isolated microsatellites. These results forge a framework for our deep understanding of the evolution and isolation strategies of microsatellites: 1) the number of isolated microsatellites increases with increased data quantity and read length; 2) assembly strategy could also influence the processes involved in microsatellite development; 3) the transcriptomic microsatellites displayed lower levels of polymorphisms and were less abundant than genomic microsatellites; 4) the proportion of each type of nucleotide repeats was not affected.

Abbreviations

DBG:: de Bruijn graph
FISH:: Fluorescence in situ hybridization
Ho:: observed heterozygosity
Na:: Number of alleles
NGS:: Next-generation sequencing
PCR:: Polymerase chain reaction
RRL:: Repeat region length
SNPs:: Single nucleotide polymorphisms
SSRs:: Simple sequence repeats

References

Ellegren H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004;5(6):435–45.
Article CAS Google Scholar
Hodel RGJ, Segovia-Salcedo MC, Landis JB, Crowl AA, Sun M, Liu X, Gitzendanner MA, Douglas NA, Germain-Aubrey CC, Chen S, et al. The report of my death was an exaggeration: a review for researchers using microsatellites in the 21st century. Appl Plant Sci. 2016;4(6):1600025.
Article Google Scholar
Selkoe KA, Toonen RJ. Microsatellites for ecologists: a practical guide to using and evaluating microsatellite markers. Ecol Lett. 2006;9(5):615–29.
Article Google Scholar
Zane L, Bargelloni L, Patarnello T. Strategies for microsatellite isolation: a review. Mol Ecol. 2002; 11(1):1–16.
Barbará T, Palma-Silva C, Paggi GM, Bered F, Fay MF, Lexer C. Cross-species transfer of nuclear microsatellite markers: potential and limitations. Mol Ecol. 2007;16(18):3759–67.
Article Google Scholar
Primmer CR, Merilä J. A low rate of cross-species microsatellite amplification success in Ranid frogs. Conserv Genet. 2002;3(4):445–9.
Article CAS Google Scholar
Abdelkrim J, Robertson BC, Stanton J-AL, Gemmell NJ. Fast, cost-effective development of species-specific microsatellite markers by genomic sequencing. BioTechniques. 2009;46(3):185–92.
Article CAS Google Scholar
Csencsics D, Brodbeck S, Holderegger R. Cost-effective, species-specific microsatellite development for the endangered dwarf bulrush (Typha minima) using next-generation sequencing technology. J Hered. 2010;101(6):789–93.
Article CAS Google Scholar
Fernandez-Silva I, Whitney J, Wainwright B, Andrews KR, Ylitalo-Ward H, Bowen BW, Toonen RJ, Goetze E, Karl SA. Microsatellites for next-generation ecologists: a post-sequencing bioinformatics pipeline. PLoS One. 2013;8(2):e55990.
Article CAS Google Scholar
Gardner MG, Fitch AJ, Bertozzi T, Lowe AJ. Rise of the machines – recommendations for ecologists when using next generation sequencing for microsatellite development. Mol Ecol Resour. 2011;11(6):1093–101.
Article Google Scholar
Schoebel C, Brodbeck S, Buehler D, Cornejo C, Gajurel J, Hartikainen H, Keller D, Leys M, Říčanová Š, Segelbacher G. Lessons learned from microsatellite development for nonmodel organisms using 454 pyrosequencing. J Evol Biol. 2013;26(3):600–11.
Article CAS Google Scholar
Elliott CP, Enright NJ, Allcock RJN, Gardner MG, Meglécz E, Anthony J, Krauss SL. Microsatellite markers from the ion torrent: a multi-species contrast to 454 shotgun sequencing. Mol Ecol Resour. 2014;14(3):554–68.
Article CAS Google Scholar
Igawa T, Nozawa M, Nagaoka M, Komaki S, Oumi S, Fujii T, Sumida M. Microsatellite marker development by multiplex ion torrent PGM sequencing: a case study of the endangered Odorrana narina complex of frogs. J Hered. 2015;106(1):131–7.
Article Google Scholar
Sultana N, Igawa T, Nozawa M, Islam MM, Hasan M, Alam MS, Khan MMR, Sumida M. Development and characterization of 27 new microsatellite markers for the Indian bullfrog Hoplobatrachus tigerinus and its congeneric species. Genes Genet Syst. 2014;89(3):137–41.
Article CAS Google Scholar
Castoe TA, Poole AW, de Koning APJ, Jones KL, Tomback DF, Oyler-McCance SJ, Fike JA, Lance SL, Streicher JW, Smith EN, et al. Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake. PLoS One. 2012;7(2):e30953.
Article CAS Google Scholar
Lance SL, Love CN, Nunziata SO, O’Bryhim JR, Scott DE, Flynn RW, Jones KL. 32 species validation of a new Illumina paired-end approach for the development of microsatellites. PLoS One. 2013;8(11):e81853.
Article Google Scholar
Wei N, Bemmels JB, Dick CW. The effects of read length, quality and quantity on microsatellite discovery and primer development: from Illumina to PacBio. Mol Ecol Resour. 2014;14(5):953–65.
CAS PubMed Google Scholar
Drechsler A, Geller D, Freund K, Schmeller DS, Künzel S, Rupp O, Loyau A, Denoël M, Valbuena-Ureña E, Steinfartz S. What remains from a 454 run: estimation of success rates of microsatellite loci development in selected newt species (Calotriton asper, Lissotriton helveticus, and Triturus cristatus) and comparison with Illumina-based approaches. Ecol Evol. 2013;3(11):3947–57.
Article Google Scholar
Dufresnes C, Brelsford A, Béziers P, Perrin N. Stronger transferability but lower variability in transcriptomic- than in anonymous microsatellites: evidence from Hylid frogs. Mol Ecol Resour. 2014;14(4):716–25.
Article CAS Google Scholar
Rico C, Normandeau E, Dion-Côté A-M, Rico MI, Côté G, Bernatchez L. Combining next-generation sequencing and online databases for microsatellite development in non-model organisms. Sci Rep. 2013;3:3376.
Article Google Scholar
Vukosavljev M, Esselink GD, van ‘t Westende WPC, Cox P, Visser RGF, Arens P, Smulders MJM. Efficient development of highly polymorphic microsatellite markers based on polymorphic repeats in transcriptome sequences of multiple individuals. Mol Ecol Resour. 2015;15(1):17–27.
Article CAS Google Scholar
Lopez L, Barreiro R, Fischer M, Koch M. Mining microsatellite markers from public expressed sequence tags databases for the study of threatened plants. BMC Genomics. 2015;16(1):781.
Article Google Scholar
Bhargava A, Fuentes FF. Mutational dynamics of microsatellites. Mol Biotechnol. 2010;44(3):250–66.
Article CAS Google Scholar
Cardoso SD, Gonçalves D, Robalo JI, Almada VC, Canário AVM, Oliveira RF. Efficient isolation of polymorphic microsatellites from high-throughput sequence data based on number of repeats. Mar Genom 2013; 11(0):11–16.
Eschbach E, Schöning S. Identification of high-resolution microsatellites without a priori knowledge of genotypes using a simple scoring approach. Methods Ecol Evol. 2013;4(11):1076–82.
Article Google Scholar
Brandström M, Ellegren H. Genome-wide analysis of microsatellite polymorphism in chicken circumventing the ascertainment bias. Genome Res. 2008;18(6):881–7.
Article Google Scholar
Petit RJ, Deguilloux MF, Chat J, Grivet D, Garnier-Géré P, Vendramin GG. Standardizing for microsatellite length in comparisons of genetic diversity. Mol Ecol. 2005;14(3):885–90.
Article CAS Google Scholar
Santibáñez-Koref MF, Gangeswaran R, Hancock JM. A relationship between lengths of microsatellites and nearby substitution rates in mammalian genomes. Mol Biol Evol. 2001;18(11):2119–23.
Article Google Scholar
Queirós J, Godinho R, Lopes S, Gortazar C, de la Fuente J, Alves PC. Effect of microsatellite selection on individual and population genetic inferences: an empirical study using cross-specific and species-specific amplifications. Mol Ecol Resour. 2015;15(4):747–60.
Article Google Scholar
Weber JL, Wong C. Mutation of human short tandem repeats. Hum Mol Genet. 1993;2(8):1123–8.
Article CAS Google Scholar
Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R. Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Natl Acad Sci U S A. 1997;94(3):1041–6.
Article CAS Google Scholar
Schug MD, Hutter CM, Wetterstrand KA, Gaudette MS, Mackay TF, Aquadro CF. The mutation rates of di-, tri- and tetranucleotide repeats in Drosophila melanogaster. Mol Biol Evol. 1998;15(12):1751–60.
Article CAS Google Scholar
Cao L-J, Li Z-M, Wang Z-H, Zhu L, Gong Y-J, Chen M, Wei S-J. Bulk development and stringent selection of microsatellite markers in the western flower thrips Frankliniella occidentalis. Sci Rep. 2016;6:26512.
Article CAS Google Scholar
Postolache D, Leonarduzzi C, Piotti A, Spanu I, Roig A, Fady B, Roschanski A, Liepelt S, Vendramin GG. Transcriptome versus genomic microsatellite markers: highly informative multiplexes for genotyping Abies alba mill. And congeneric species. Plant Mol Biol Rep. 2014;32(3):750–60.
Article CAS Google Scholar
Dodsworth S. Genome skimming for next-generation biodiversity analysis. Trends Plant Sci. 2015;20(9):525–7.
Article CAS Google Scholar
Li Y-C, Korol AB, Fahima T, Nevo E. Microsatellites within genes: structure, function. and Evolution Mol Biol Evol. 2004;21(6):991–1007.
Article CAS Google Scholar
Estoup A, Jarne P, Cornuet JM. Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Mol Ecol. 2002;11(9):1591–604.
Article CAS Google Scholar
Khimoun A, Ollivier A, Faivre B, Garnier S. Level of genetic differentiation affects relative performances of expressed sequence tag and genomic SSRs. Mol Ecol Resour. 2017;17(5):893–903.
Article CAS Google Scholar
Greshake B, Zehr S, Dal Grande F, Meiser A, Schmitt I, Ebersberger I. Potential and pitfalls of eukaryotic metagenome skimming: a test case for lichens. Mol Ecol Resour. 2016;16(2):511–23.
Article CAS Google Scholar
Hunt M, Newbold C, Berriman M, Otto TD. A comprehensive evaluation of assembly scaffolding tools. Genome Biol. 2014;15(3):42.
Article Google Scholar
Hancock JM. Genome size and the accumulation of simple sequence repeats: implications of new data from genome sequencing projects. Genetica. 2002;115(1):93–103.
Article CAS Google Scholar
Sambrook J, Russell DW. Molecular cloning: a laboratory manual. New York: Cold Spring Harbor Press; 2001.
Google Scholar
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Article CAS Google Scholar
Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 2012;1(1):18.
Article Google Scholar
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M. De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512.
Article CAS Google Scholar
Meglécz E, Pech N, Gilles A, Dubut V, Hingamp P, Trilles A, Grenier R, Martin JF. QDD version 3.1: a user-friendly computer program for microsatellite selection and primer design revisited: experimental validation of variables determining genotyping success rate. Mol Ecol Resour. 2014;14(6):1302–13.
Article Google Scholar
Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000;132(3):365–86.
CAS PubMed Google Scholar
Thiel T, Michalek W, Varshney R, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet. 2003;106(3):411–22.
Article CAS Google Scholar
Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012;28(4):593–4.
Article Google Scholar
Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P. MICRO-CHECKER: software for identifying and correcting genotyping errors in microsatellite data. Mol Ecol Notes. 2004;4(3):535–8.
Article Google Scholar
Rousset F. genepop’007: a complete re-implementation of the genepop software for windows and Linux. Mol Ecol Resour. 2008;8(1):103–6.
Article Google Scholar
Luo W, Xia Y, Wang Q, Zeng X. Isolation and characterization of fifteen microsatellite markers for Sichuan torrent frog Amolops mantzorum (Anura: Ranidae). Conserv Genet Resour. 2015;7(2):489–91.
Article Google Scholar
Xia Y, Hu LJ, Shan X, Zheng YC, Zeng XM. Isolation and characterization of eleven polymorphic tetranucleotide microsatellite loci for Quasipaa boulengeri (Anura: Dicroglossidae). Conserv Genet Resour. 2013;5(1):5–7.
Article Google Scholar
Yuan S, Xia Y, Zheng Y, Zeng X. Development of microsatellite markers for the spiny-bellied frog Quasipaa boulengeri (Anura: Dicroglossidae) through transcriptome sequencing. Conserv Genet Resour. 2015;7(1):229–31.
Article Google Scholar
Guichoux E, Lagache L, Wagner S, Chaumeil P, LÉGer P, Lepais O, Lepoittevin C, Malausa T, Revardel E, Salin F, et al. Current trends in microsatellite genotyping. Mol Ecol Resour. 2011;11(4):591–611.
Article CAS Google Scholar
Ljungqvist M, ÅKesson M, Hansson B. Do microsatellites reflect genome-wide genetic diversity in natural populations? A comment on Väli et al. (2008). Mol Ecol. 2010;19(5):851–5.
Article Google Scholar
Sharma PC, Grover A, Kahl G. Mining microsatellites in eukaryotic genomes. Treands Biotechnol. 2007;25(11):490–8.
Article CAS Google Scholar
Huang J, Li YZ, Du LM, Yang B, Shen FJ, Zhang HM, Zhang ZH, Zhang XY, Yue BS. Genome-wide survey and analysis of microsatellites in giant panda (Ailuropoda melanoleuca), with a focus on the applications of a novel microsatellite marker system. BMC Genomics 2015; 16(1):61.
Jan C, Fumagalli L. Polymorphic DNA microsatellite markers for forensic individual identification and parentage analyses of seven threatened species of parrots (family Psittacidae). PeerJ. 2016;4:e2416.
Article Google Scholar
Ruiz-Ruano F, Cuadrado Á, Montiel E, Camacho J, López-León M. Next generation sequencing and FISH reveal uneven and nonrandom microsatellite distribution in two grasshopper genomes. Chromosoma. 2015;124(2):221–34.
Article CAS Google Scholar
Stolle E, Kidner JH, Moritz RFA. Patterns of evolutionary conservation of microsatellites (SSRs) suggest a faster rate of genome evolution in Hymenoptera than in Diptera. Genome Biol Evol. 2013;5(1):151–62.
Article Google Scholar
Schlötterer C. Evolutionary dynamics of microsatellite DNA. Chromosoma. 2000;109(6):365–71.
Article Google Scholar
Li YC, Korol AB, Fahima T, Beiles A, Nevo E. Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol. 2002;11(12):2453–65.
Article CAS Google Scholar
Gregory TR. 2005. Animal genome size Database. http://www.genomesize.com. Accessed 10 Oct 2017.
Murphy RW, Lowcock LA, Smith C, Darevsky IS, Orlov N, MacCulloch RD, Upton DE. Flow cytometry in biodiversity surveys: methods, utility, and constraints. Amphibia-Reptilia. 1997;18(1):1–13.
Article Google Scholar
Vinogradov AE. Genome size and GC-percent in vertebrates as determined by flow cytometry: the triangular relationship. Cytometry. 1998;31(2):100–9.
Article CAS Google Scholar
Garner TW. Genome size and microsatellites: the effect of nuclear size on amplification potential. Genome. 2002;45(1):212–5.
Article CAS Google Scholar
Van De Vliet MS, Diekmann OE, Serrao EA, Beja P. Isolation of highly polymorphic microsatellite loci for a species with a large genome size: sharp-ribbed salamander (Pleurodeles waltl). Mol Ecol Resour. 2009;9(1):425–8.
Article Google Scholar
Compeau PEC, Pevzner PA, Tesler G. How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011;29(11):987–91.
Article CAS Google Scholar
Chen K, Chen L, Fan X, Wallis J, Ding L, Weinstock G. TIGRA: a targeted iterative graph routing assembler for breakpoint assembly. Genome Res. 2014;24(2):310–7.
Article Google Scholar
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
Article CAS Google Scholar
Huang X, Chen XG, Armbruster PA. Comparative performance of transcriptome assembly methods for non-model organisms. BMC Genomics. 2016;17(1):1–14.
Google Scholar
Andersen JC, Mills NJ. iMSAT: a novel approach to the development of microsatellite loci using barcoded Illumina libraries. BMC Genomics. 2014;15(1):1.
Article Google Scholar
Bachtrog D, Agis M, Imhof M, Schlötterer C. Microsatellite variability differs between dinucleotide repeat motifs—evidence from Drosophila melanogaster. Mol Biol Evol. 2000;17(9):1277–85.
Article CAS Google Scholar
Ellis J, Burke J. EST-SSRs as a resource for population genetic analyses. Heredity. 2007;99(2):125–32.
Article CAS Google Scholar
Scott KD, Eggler P, Seaton G, Rossetto M, Ablett EM, Lee LS, Henry RJ. Analysis of SSRs derived from grape ESTs. Theor Appl Genet. 2000;100(5):723–6.
Article CAS Google Scholar
Yuan S, Xia Y, Zeng XA. Sex-linked microsatellite marker reveals male Heterogamety in Quasipaa boulengeri (Anura: Dicroglossidae). Asian Herpetol Res. 2017;8(3):184–9.
Google Scholar

Download references

Acknowledgments

We thank Han Yang, Weizhao Yang and Xiang Tao for the NGS bioinformatic analysis. We gratefully acknowledge technical support from the laboratory of X.Z.

Funding

This research was supported by the National Key Programme of Research and Development, Ministry of Science and Technology (2017YFC0505202), and National Natural Science Foundation of China (NSFC–31401960, 31572241, 31572243), and the China Postdoctoral Science Foundation (2014 M552386).

Availability of data and materials

Genome skimming sequence data deposited in the NCBI SRA: Accession no. SRR3929656, SRR5248586, SRR5248585, SRR5248584, SRR3929655, SRR5248583; Transcriptome sequencing data, NCBI SRA Accession no. SRR5248938, SRR5248937.

Author information

Authors and Affiliations

Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
Yun Xia, Wei Luo, Siqi Yuan, Yuchi Zheng & Xiaomao Zeng
University of Chinese Academy of Sciences, Beijing, 100049, China
Wei Luo
College of Bioengineering, Sichuan University of Science & Engineering, Zigong, 643000, China
Siqi Yuan

Authors

Yun Xia
View author publications
You can also search for this author in PubMed Google Scholar
Wei Luo
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Yuchi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomao Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YX and XZ conceived and designed the study. YX and YZ performed the NGS bioinformatic analysis. SY and WL performed the laboratory experiments, microsatellite genotyping and population genetic analyses. SY, WL and YZ collected samples. YX, YZ and XZ wrote the manuscript. All authors reviewed the manuscript and approved the final version.

Corresponding author

Correspondence to Xiaomao Zeng.

Ethics declarations

Ethics approval and consent to participate

All work with animals was conducted according to relevant national and international guidelines on the Protection of Wildlife (Chengdu Institute of Biology issued permit number CIB#2014–36 and CIB#2014–110). All animal care and experimental procedures were approved by the Animal Care and Use Committee of Chengdu Institute of Biology, Chinese Academy of Sciences (Permit Number: CIB-20121220A). All voucher specimens were deposited in the Herpetological Museum of Chengdu Institute of Biology, Chinese Academy of Sciences.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Table S1. Summary of sample information for next-generation sequencing and microsatellite discovery. (XLSX 12 kb)

Additional file 2:

Table S2. Microsatellite screening data of Amolops mantzorum and Quasipaa boulengeri for each sample. (XLS 94 kb)

Additional file 3:

Table S3. Characteristics of the microsatellite markers for Amolops mantzorum and Quasipaa boulengeri. (XLSX 38 kb)

Additional file 4:

Table S4. Number of identified SSRs by MISA from genomic data of Amolops chunganensis and Amolops mantzorum. (XLSX 12 kb)

Additional file 5:

Table S5. Statistics and assembly results for transcriptome sequencing of Amolops mantzorum and Quasipaa boulengeri using SOAPdenovo2 with different k-mer sizes and Trinity. (XLSX 12 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Xia, Y., Luo, W., Yuan, S. et al. Microsatellite development from genome skimming and transcriptome sequencing: comparison of strategies and lessons from frog species. BMC Genomics 19, 886 (2018). https://doi.org/10.1186/s12864-018-5329-y

Download citation

Received: 10 November 2017
Accepted: 28 November 2018
Published: 07 December 2018
DOI: https://doi.org/10.1186/s12864-018-5329-y

Microsatellite development from genome skimming and transcriptome sequencing: comparison of strategies and lessons from frog species

Abstract

Background

Results

Conclusions

Similar content being viewed by others

Background

Methods

Contig assembly and microsatellite detection

Length and quantity simulation

PCR amplification and genotyping

Results

Sequence data from Illumina NGS

NGS assembly and microsatellite detection

Data quantity and microsatellite discovery

Read length simulation

Microsatellite development and polymorphism

Discussion

Data quantity, read length, assembly and SSR abundance

Microsatellite polymorphism

Transcriptome versus genome microsatellite markers

Conclusions

Abbreviations

References

Acknowledgments

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Additional files

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation