The red listed shore bird great snipe (Gallinago media) is an ecological model species for studies of evolution of the lek mating system, sexual selection and mate choice (Höglund and Robertson 1990; Fiske et al. 1994; Sæther et al. 2005; Ekblom et al. 2010) as well as for migration research (Lindström et al. 2016). Previous genetic resources for this species have been limited to five microsatellites (SNIPE B2, 3, B5, 12 and 20; GenBank Accession numbers AY363298–AY363302; Sæther et al. 2007) and a few candidate gene sequences (Ekblom et al. 2007). This lack of large scale genetic information has severely hampered population genetic studies of this charismatic bird. Here we report on a recent large scale development of molecular markers (both microsatellites and SNP markers) using 454 transcriptome sequencing (RNA-Seq).

Great snipe males were captured for ringing using mist nets, on two active leks in the Gåvålia study population of central Norway (62°17′N, 9°36′E) in the spring of 2010 (Fig. 1). For a full description of field site and methodology please see (Løfaldli et al. 1992; Fiske and Kålås 1995). Blood was taken from the brachial vein and immediately stored in RNAprotect Animal Blood Tubes (QIAGEN). RNA extractions were done using RNeasy Protect Kits (QIAGEN). Full length cDNA was synthesised using the MINT kit (Evrogen) and sequencing of cDNA libraries was performed using the Genome Sequencer FLX (Roche) technology. One full 454-plate was divided into two regions with seven samples (with individual MID tags) run on each region (Table 1). Sequencing produced more than 800,000 reads in total, with a mean of 57,000 reads per individual (Table 1). Raw transcriptome sequence reads are available at SRA (SRA060814).

Fig. 1
figure 1

Male great snipe from the study area in Norway, during ringing and blood sampling

Table 1 Summary of 454 transcriptome sequence data used in this study

After adapter- and index tag removal and quality trimming, transcriptome reads for all individuals were assembled jointly using the “cDNA mode” in GSassembler (Newbler, version 2.6, 454 Life Sciences). De-novo assembly of the Transcriptome produced 6367 contigs with an average length of 546 bp (total assembly length 3.5 Mbp; Supplementary material 1). A more detailed analysis of the gene expression levels and patterns of selection on genetic variation in expressed genes have been described in a recently published manuscript (Höglund et al. 2017).

We used msatcommander (Faircloth 2008) to identify microsatellites, searching for di- to hexa repeats in all contigs and unassembled reads. We found a total of 815 microsatellite repeat sequences in the great snipe transcriptome (Table 2). For 140 of these sufficient flanking sequence information was available to allow for PCR primer design. Detailed information about these, including repeat type and suggested primer sequences are given in Supplementary material 2.

Table 2 Number of microsatellite repeat sequences identified from the great snipe transcriptome

We performed read mapping and SNP calling using GSmapper (Newbler, version 2.6, 454 Life Sciences) with the transcriptome contig file as the reference sequence. In total we identified 2874 variable positions (SNPs) in the great snipe transcriptome, 2434 of which had at least 60 bp of flanking sequence on both sides to allow primer design for genotyping. Detailed information about these, including flanking sequences are given in Supplementary materials 3 and 4. Out of 48 evaluated SNPs, 39 were verified as true polymorphisms using independent SNP genotyping of a larger sample of individuals using the Illumina GoldenGate SNP genotyping platform (Höglund et al. 2017). Four of the tested SNPs were found to be monomorphic and five did not produce reliable genotype calls.

Two hundred and seventy eight of the SNP containing contigs were annotated using a BLAST approach, matching against chicken genome and transcriptome sequences (Supplementary material 5). Note that these described markers (both SNPs and microsatellites) are situated in transcribed parts of the genome. They are thus likely to be linked to functional genes and cannot be assumed to be selectively neutral (Ekblom and Galindo 2011).