Introduction

Microsatellite DNA is one of the most powerful genetic markers for wildlife conservation genetics and population genetics (Frankham et al. 2010; Guichoux et al. 2011). Recent advances in next-generation sequencing (NGS) technology have accelerated the development of novel microsatellite markers for target species (Gardner et al. 2011; Schoebel et al. 2013; Wei et al. 2014 and references therein). There are two main approaches in NGS-based microsatellite isolation: whole-genome shotgun sequencing (e.g., Abdelkrim et al. 2009) and sequencing of a microsatellite-enriched library (e.g., Malausa et al. 2011). The shotgun approach is to generate enough random sequences to isolate a satisfying number of microsatellite-containing sequences by chance. On the other hand, the enrichment approach is to construct a microsatellite-enriched library by conducting hybridization with repeat probes to genomic DNA fragments. After washing the non-hybridized DNA that presumably lacks repeat regions, the remaining DNA fragments are sequenced to isolate microsatellite-containing sequences. In each approach, the constructed libraries are appropriately pooled for multiplexed sequencing on an NGS platform.

Although the discovery of microsatellite-containing sequences is more efficient with the enrichment approach than with the shotgun approach (Malausa et al. 2011), the construction of a microsatellite-enriched library requires specialized bench work and is somewhat complicated in a routine laboratory setting (Gonzalez and Zardoya 2013). Consequently, many researchers prefer to use external services for the microsatellite-enrichment step, which requires more time to acquire candidate sequences for the development of microsatellite markers (Malausa et al. 2011; Gonzalez and Zardoya 2013). A solution to this problem is to improve the efficiency and speed of the conventional microsatellite-enrichment method in a routine laboratory setting.

In this study, we attempted to simplify the process of constructing a microsatellite-enriched library for multiplexed parallel sequencing. To simplify the enrichment method in order to reduce the time required, we applied an easy-to-use commercially available kit for the hybridization and wash steps in the target capture method. In the conventional enrichment method, after microsatellite-enriched library preparation, the library is ligated with NGS adapters and sequenced on an NGS platform. In contrast, in the improved enrichment method, after NGS shotgun library preparation, we captured microsatellite-containing DNA fragments directly from the NGS shotgun library by using the commercially available kit. Genomic DNA from each fish species was fragmented by digesting with enzyme. The prepared NGS shotgun library was hybridized with the biotinylated CA repeat probe, and then was subjected to wash step for exclusion of the non-hybridized DNA by using the commercially available kit. The enrichment libraries of 30 fish species were prepared for multiplexed parallel sequencing on a bench-top NGS platform (454 GS Junior, Roche). The three runs of multiplexed parallel sequencing were conducted to isolation the microsatellite-containing sequences. Sequences were analyzed using the program to identify and select the microsatellite sequences and to design primer pairs. Furthermore, to validate the effectiveness of the present approach, we checked the ability of the designed primer sets to amplify and detect polymorphisms based on the microsatellite-containing sequences for ayu (Plecoglossus altivelis), a commercially important species in Japanese inland fisheries.

Materials and methods

DNA extraction and library preparation

Genomic DNA was extracted from 30 fish species (Table 1) with the DNeasy Blood and Tissue kit (Qiagen), the Gentra Puregene kit (Qiagen), or the Wizard Genomic DNA Purification kit (Promega). For each fish species, approximately 0.5–1 μg of genomic DNA was fragmented by digesting with dsDNA Fragmentase (New England Biolabs) for 15–20 min at 37 °C. The fragmented DNAs were purified using the MinElute PCR Purification kit (Qiagen). The size distribution of fragmented DNAs was assessed on a 2100 Bioanalyzer by using the DNA High Sensitivity kit (Agilent Technologies). For each fragmented DNA, a shotgun library was prepared using the NEBNext Quick DNA Library Prep Master Mix Set for 454 (New England Biolabs). Each fragmented DNA was end repaired, A-tailed, and ligated to one of the 12 multiplex identifier (MID) oligonucleotide adaptors (Roche) for multiplexed sequencing (Table 1). For each shotgun library, small DNA fragments were subsequently removed using the AMPure purification system (Agencourt Bioscience).

Table 1 Summary of the sequencing libraries, sequencing results, and post-sequencing selection of microsatellite loci for each species

Target capture of microsatellite-containing DNA fragments

We performed target capture of microsatellite-containing DNA fragments by using the CA repeat probe and the SeqCap EZ hybridization and wash kit (Roche), according to the general guidelines provided in the NimbleGen SeqCap EZ Library LR User’s Guide ver.1.0 (Roche) with slight modifications, as briefly described below. The prepared shotgun libraries were amplified by 12–15 cycles of pre-capture linker-mediated PCR (LM-PCR). The size distribution of pre-capture LM-PCR products was assessed on the 2100 Bioanalyzer by using a DNA 7500 kit (Agilent Technologies). The pre-capture LM-PCR products were quantified using the Nanodrop 2000 (Thermo Scientific) or the Qubit dsDNA HS assay kit (Invitrogen). One shotgun library was prepared from the pre-capture LM-PCR product for comparison. Approximately 0.5–1 μg of each pre-capture LM-PCR product was hybridized to 20 picomoles of biotinylated probe [B-ATAGAATAT(CA)16] at 55 °C for 1 h. During the hybridization, COT human DNA was not added to the hybridization component. The hybridization mixture was incubated with streptavidin-coated Dynabeads M-270 (Invitrogen), and then non-captured material (the unbound target DNA presumably lacking microsatellites and the unbound probe) was washed away. After washing, each captured library was amplified by 15 cycles of post-capture LM-PCR. The size distribution of the post-capture LM-PCR products—the microsatellite-captured library—was assessed on a 2100 Bioanalyzer by using the DNA 7500 kit (Agilent Technologies). The microsatellite-captured libraries were quantified using the Qubit dsDNA HS assay kit (Invitrogen) or the KAPA Library quantification kit (Kapa Biosystems). Finally, 30 microsatellite-captured libraries and one shotgun library were constructed and quantified.

Pooling libraries, emulsion PCR, and pyrosequencing

For each of the three runs of multiplexed parallel sequencing, a different library pool (made from 9 to 12 libraries) was sequenced (Table 1), which were distinguished by 12 MID tags. Each library pool was quantified using the KAPA Library quantification kit (Kapa Biosystems) and was then separately sequenced using the GS Junior System (Roche). Emulsion PCR with 0.2 DNA copies per amplification bead, breaking, and pyrosequencing were performed according to the manufacturer’s protocols for the Lib-L kit (Roche). After each multiplexed sequencing run, the output SFF (Standard Flowgram Format) files were separated according to the sequences of MID tags by using the ‘sfffile’ program (Roche), and the resulting SFF files were converted into FASTA format files by using the ‘sffinfo’ program (Roche).

Data analysis and primer design

Sequences were received in the form of FASTA files and analyzed using the pipeline QDD version 2 (Meglécz et al. 2010) to identify and select microsatellite sequences and to design primer pairs. Sequences longer than 100 bp and containing at least five repeats of perfect microsatellites, composed of one single motif of 2- to 6-bp length with no interruption, were selected for further analysis. Sequence similarities were identified by an “all against all” BLAST (Altschul et al. 1997) analysis, using an e-value of 1E-40 and with microsatellite sequences soft-masked. Sequences exceeded 95% pairwise similarity in flanking regions were grouped into contigs and a 2/3 majority rule consensus sequence was created from each contig. Sequences with significant BLAST hits to other sequences and an overall similarity in flanking region of less than 95% were excluded to avoid potential duplicated loci and mobile elements. Primer sequences were then designed based on all of the unique sequences and the consensus sequences. Primer pairs were designed with Primer3 (Rozen and Skaletsky 1999), implemented in QDD with the following criteria: (1) PCR product lengths were set between 90 and 450 bp, several primer pairs designed in silico for each sequence with an interval of 50 bp; (2) an optimal primer length of 24 bp (range 20–30 bp); (3) an optimal primer pair annealing temperature of 63 °C (range 60–66 °C); and (4) 50% GC content (range 20–80%).

Validation and characterization of microsatellite markers in ayu

To validate the effectiveness of the present approach, we screened for potential primer sets for ayu (P. altivelis). In ayu, 3405 unique sequences and consensus sequences fulfilled the requirements for primer design (Table 1). During primer selection, QDD grouped the primers into seven different primer designs (A–G). We chose only the most restrictive, design A (334 sequences), which fulfills the following conditions: (1) no repeats of a single base in the flanking and primer regions; (2) no other target microsatellites in the flanking region; (3) no nanosatellites, that is 3–4 tandem repetition of a 2- to 6-bp motif, in the flanking and primer regions; and (4) not allowing compound microsatellites. The final step was to select only sequences whose microsatellites comprise a motif repeated more than ten times. After all this exclusions, 88 sequences were retained for primer synthesize. To facilitate multiplex PCR, we selected 44 of 88 primer sets based on expected PCR product size and the forward primer of each locus was synthesized with one of four universal tag sequences (Table 2; see also Blacket et al. 2012). The four universal tag primers can be combined with four fluorophores to co-amplify multiple loci via multiplex PCR.

Table 2 Primer sequences and characteristics of 37 microsatellite loci for ayu, Plecoglossus altivelis

Initial screening of the 44 designed primer sets was performed on four ayu individuals (collected from Hidaka River, Wakayama, Japan). For this screening, single-locus PCR reactions were carried out in 7 μl of final reaction mixture, containing 1 × GoTaq Green Master mix (Promega), 0.86 μM of each primer, and approximately 50–250 ng of template DNA. The thermal cycling profile was 95 °C for 5 min; 40 cycles of 94 °C for 15 s, 59 °C for 15 s, and 72 °C for 30 s; and a final extension at 60 °C for 7 min. The PCR products were separated by electrophoresis on a 1.5% agarose gel, and the 37 primer sets that were successfully amplified were subsequently tested for ability to detect polymorphisms by using two forms of ayu, the amphidromous and landlocked Lake Biwa forms, between which substantial genetic differences have been revealed by previous genetic studies using microsatellite DNA markers (Takagi et al. 1999; Takeshima et al. 2009, 2016).

Forty-eight individuals of each form (the amphidromous form collected from the Hidaka River and the landlocked form collected from the Ane River in the Lake Biwa system, Shiga, Japan) were used to test for polymorphism of the 37 primer sets. These primer sets were amplified using five multiplex PCR reactions (Table 2). PCR reactions were carried out in 7 μl of final reaction mixture (Sets I and II) or 4 μl (Sets III–V), containing 1 × Type-it Microsatellite PCR kit (Qiagen), approximately 50–250 ng of template DNA, and the forward primer, reverse primer, and universal tail primer in a 1:2:1 ratio (Blacket et al. 2012). The final concentration of the forward primers was optimized for each marker (Table 2). Each universal tail primer was fluorescently labeled with 6-FAM, VIC, NED, or PET (Applied Biosystems). The thermal cycling profile was 95 °C for 5 min; 40 cycles of 94 °C for 15 s, 59 °C for 15 s, and 72 °C for 30 s; and a final extension at 60 °C for 30 min. Microsatellite products were analyzed on a 3130XL Genetic Analyzer (Applied Biosystems) with GeneScan 500 LIZ Size Standard (Applied Biosystems). Genotyping was performed with the software GeneMapper version 3.7 (Applied Biosystems).

For each form of ayu, genetic variability parameters, including the number of alleles per locus (N A), observed heterozygosity (H O), and expected heterozygosity (H E), were calculated at each locus by using GENETIX 4.05 (Belkhir et al. 2004). Departure from Hardy–Weinberg equilibrium (HWE) within each form by locus and across all loci, as well as linkage disequilibrium (LD) among loci, were tested by using GENEPOP 3.4 (Raymond and Rousset 1995). Tests for HWE and LD employed a Markov chain method to estimate without bias the exact P-values proposed by Guo and Thompson (1992), with the following chain parameters: 10,000 dememorization steps, 100 batches, and 5000 iterations per batch. To avoid type I error, a sequential Bonferroni correction (Rice 1989) was applied to P-values from multiple tests. In addition, the presence of null alleles at each locus was checked for each form by using the software MicroChecker ver. 2.2.3 (van Oosterhout et al. 2004). Genetic differentiation between the two forms of ayu was examined by the fixation index (F ST) for each locus (Weir and Cockerham 1984) by using Arlequin ver 3.0 (Excoffier et al. 2005), with statistical significance estimated from 16,000 permutations.

Results and discussion

The results of the three runs of multiplexed parallel sequencing are summarized in Table 1. From the runs, we obtained 126,196, 165,625, and 204,813 sequences in total, respectively. The average sequence length for each sequencing run was 298, 280, and 320 bp, respectively. Among the 30 microsatellite-captured libraries, we found high proportions of microsatellite-containing sequences, ranging from 43 to 79%. Thus, sufficient numbers of primer sets for developing microsatellite markers (from 1029 to 6606) were effectively designed for each species. We also found high proportions of sequences in which primers were designed with CA motif, ranging from 71 to 95%, among the 30 microsatellite-captured libraries.

To validate the effectiveness of our approach for microsatellite isolation, we screened potential primer sets designed for ayu. The results of screening with 37 primer sets in the two forms of ayu are summarized in Table 2. The genetic variability parameters N A, H E, and H O in the two forms of ayu ranged from 2 to 24, from 0.021 to 0.948, and from 0.021 to 0.958, respectively. Significant deviations from HWE were observed in only 9 of the 74 probability tests (Table 2) after sequential Bonferroni correction (P < 0.05). One possible cause for the observed heterozygote deficiency is the existence of null alleles. In microsatellite studies, primer-site mutations result in non-amplification of some alleles (null alleles). This possibility was further investigated by MicroChecker analyses, which detected signs of a null allele at Plal-015, Plal-025, Plal-029, and Plal-035 in both forms and at Plal-006, Plal-020, and Plal-041 in the amphidromous form only. The consistence of results from HWE and MicroChecker suggested there was a high probability of null alleles at Plal-015, Plal-025, Plal-029, and Plal-035.

For both forms of ayu, analyses of LD for each locus showed only 3 significant pairwise comparisons between loci (of 1330 comparisons) after sequential Bonferroni correction (P < 0.05), without specific interlocus relationships, suggesting the overall independence of the loci examined. F ST values between the two forms of ayu for each locus ranged from − 0.004 to 0.248, and the values were significantly different from zero at 23 of the 37 loci (P < 0.05). Allelic variability at these 23 microsatellite markers in ayu will be useful for studying population structure.

In this study we used a commercially available kit for target capture and performed multiplexed parallel sequencing on a bench-top NGS platform. Our improved approach effectively determined massive numbers of microsatellite-containing sequences, and the sequences provided enough information for designing potential primer sets of microsatellite markers. Based on the obtained sequences, we succeeded in developing 23 useful microsatellite DNA markers for analyzing the population structure of ayu. These results prove the effectiveness of our approach for microsatellite marker development in a routine laboratory setting. The present approach could be effectively applied to develop microsatellite markers for other fishes or other organisms. In the present study, the time required for microsatellite capture and sequencing was approximately 1 week, which is substantially less than that of the conventional approach (usually more than 2 weeks; Malausa et al. 2011). In addition, preparing the microsatellite-captured library was a simple process.

To evaluate the benefit of microsatellite capture, we compared the results from the microsatellite-captured library and the shotgun library of Lates japonicus (Table 1). The proportion of sequences of primers designed in the microsatellite-captured library (3622/11,730 sequences: 31%) was 6.2 times that in the shotgun library (264/4987: 5%), indicating that microsatellite capture is substantially more effective than the shotgun method for the development of microsatellite markers.

Finally, we briefly discuss the use of this new approach for future research. In the present study, we used only the CA repeat probe to capture microsatellite DNA from target fishes because CA is the most common dinucleotide motif in the vertebrate genome (Chistiakov et al. 2006). If our approach is applied to other organisms (e.g., invertebrates and plants), various repeat motifs should be used as capture probes. Although Roche announced that the company will stop supporting the 454 GS Junior platform by mid-2016, our approach using the SeqCap EZ hybridization and wash kit (Roche) can readily be adopted with Illumina’s the MiSeq platform using the 300 bp paired-end sequencing format. Therefore, our approach may be useful for further development of novel microsatellite markers for conservation genetics and population genetics research.