Microsatellite markers are used extensively within molecular ecology and conservation genetics, as well as in stock assignment and assessments of genetic diversity for commercial fish (e.g. Hansen et al. 2001a, b; Perez-Enriquez et al. 1999). One of the most commercially important species in the North Sea and Baltic Sea is the Atlantic herring (Clupea harengus; ICES 2011). However, remarkably few genetic resources have been developed for this species—there are currently nine Atlantic herring (McPherson et al. 2001), and 14 Pacific herring Clupea pallasii (Olsen et al. 2002) microsatellite markers. In this study, we used next generation sequencing to develop transcriptome-derived microsatellite markers for the Atlantic herring.

Two herring (one of German and one of Finnish origin) were obtained from a public aquarium (Sea Life Helsinki), and euthanized with MS222. Total RNA from 30 mg of gill tissue was extracted using the RNeasy Minikit (Qiagen, Finland); the resulting concentrations were 371.3 and 332.6 ng/μl. cDNA was synthesized using the SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen). The concentrations were equalised, the samples were barcoded and pooled, and sequenced on half a plate using a GS FLX machine (454 Life sciences/Roche) by the University of Helsinki sequencing service. Barcodes and poly-A tails were removed from the data, and the sequences (416,521 reads, mean length 283.7 base pairs) were assembled using MIRA 3.2.1 (Chevreux et al. 2004). This assembly (unpadded, 33,701 contigs, mean length 315.8 bp, max length = 5,284 bp) was used with QDD (Meglécz et al. 2010) for the identification of microsatellites and primers. In total, 266 simple markers and 75 compound markers with primer binding sites and ≥5 repeat units were identified. Of these, 155 simple markers were chosen for testing based on either having (1) ≥6 repeat units of a motif of any size, or (2) five repeat units of motifs between three and six base pairs long.

The 155 markers were first tested on eight individuals, consisting of two individuals from each of four populations (SIM: Finland, 65°37′N, 24°52′E; P: Estonia, 59°03′N, 22°28′E; RUG: Germany, 54°34′N, 13°27′E; SPA: Sweden, 59°5′N, 11°14′E). Total DNA was extracted using a silica-based method (Ivanova et al. 2006). Primers were labelled with FAM, HEX, or TET (DNA Technology A/S, Denmark), and 10 μl PCRs were performed using 2 pmol of each primer, 1× Phusion® Flash High-Fidelity PCR Master Mix (Finnzymes) and ~2 ng of DNA. The cycling profile was: 98°C for 1 min, followed by 34 cycles of 98°C for 1 s, 58°C for 12 s, 72°C for 20 s, and a final extension at 72°C for 1 min. The PCR products were analysed using a MegaBace 1,000 capillary sequencer and Fragment Profiler 1.2 software (GE Healthcare, Life Sciences). Markers that did not amplify, could not be scored, or did not show ≥2 alleles were removed from the marker set (94 loci removed). The reduced marker set (61 loci, Table 1) was then amplified in 40 individuals, comprising of eight individuals from five populations (four populations as above and FVI: Finland 60°30′N, 27°45′E). Three of these markers had very low polymorphism (i.e. close to fixation of a single allele across populations), and three other markers showed problems with scoring (i.e. single base pair changes or non-specific binding). These six markers with problems were discarded, and the remaining 55 markers were tested with 25 individuals from each of three populations (P, RUG, SIM), to test for Hardy–Weinberg Equilibrium (HWE) and linkage disequilibrium (LD) using GenePop on the Web (Raymond and Rousset 1995). Four of these markers were not possible to score reliably and so were removed at this stage, leaving 51 markers. Tests for HWE identified eight loci with heterozygote deficiency in at least one population, however only one locus showed consistent problems across all three populations (Her28, Table 1). Tests for LD showed no locus pairs with significant LD in more than one population. The final count for highly reliable markers was 50.

Table 1 Transcriptome-derived microsatellites

As the newly developed markers are within transcribed genes, they may not follow neutral expectations, though few deviations were observed in this study. However, because of their linkage to transcribed genes, they may be particularly useful for looking at population differentiation related to local adaptation. This application is potentially important for identifying population units for fisheries management, and later for stock assignments. Comparisons between similar studies (e.g. Csencsics et al. 2010; Mikheyev et al. 2010; Nair et al. 2011) are difficult because genomes of different taxa vary substantially in size and microsatellite composition, and also because many publications do not specify their criteria for determining a potential marker. However, it is clear that the technique of mining small 454 sequencing runs for microsatellites is highly effective.