Background

Understanding the genetic basis of adaptive polygenic phenotypes is a major challenge in evolutionary biology, in particular for complex behavioural phenotypes. Aided by development of high-throughput sequencing technology, recent studies have made significant progress in identifying gene loci that shape polygenic phenotypes [1]. It has become clear that short-term responses to selection usually involve gene regulation, rather than changes in protein-coding sequences [2,3,4,5,6,7]. However, we currently have limited understanding of how these observed adaptive changes in gene expression and phenotype are specified by variation at genomic loci, which ultimately form the basis of inheritance. To understand how complex phenotypes evolve, it is therefore important to link transcriptomic and phenotypic changes to genomic changes during the same episode of adaptation.

A behavioural trait that responds readily to selection is associative learning ability and memory formation, which are part of an organism’s cognitive repertoire [8, 9]. Learning and memory enable an organism to use information from previous experience for a functional change in behaviour in response to changing situations [10]. This is specifically relevant when environments vary within the lifetime of an individual, as it enables the establishment of predictive relationships [11]. Most if not all animals studied have demonstrated the ability of at least a simple form of learning [12, 13], yet the diversity in cognitive abilities is enormous. Studies on a large number of species have revealed large intra- and interspecific variation in how quickly information can be learned and how long memory will last [14], even between closely related species [15, 16] or different populations of the same species [17]. It is commonly assumed that the wide variety in learning abilities reflect adaptations to the differing natural conditions under which these animals operate and that specific learning abilities are the net result of the costs and benefits of learning under the encountered conditions [18,19,20,21]. However, the genetic architecture that shapes natural variation in learning and memory dynamics remains poorly understood [22].

Studies using laboratory-generated mutants of Drosophila melanogaster (like e.g. dunce or rutabaga) have been successful in identifying single loci with large effects on memory formation [23,24,25]. It remains to be investigated how much the specific genes described for these mutants contribute to natural, segregating variation in learning ability and memory retention. Such variation in cognitive ability may very well depend on more subtle variation in the genes described in these mutants or a wide range of completely different genes that are important for underlying neural pathways or processes, like e.g. the many genes associated with the dopaminergic neural circuitry [26] or neural plasticity in brain cells that influence approach or avoidance behaviour [27]. In addition, focusing on sequence evolution of target genes neglects processes of regulatory evolution (e.g. gene expression) and its contribution to the evolution of adaptive phenotypes. For example, induced expression of a specific splice variant of cAMP-responsive transcription factor CREB in D. melanogaster results in long-term memory formation after a single conditioning trial, whereas this normally requires 10 spaced conditioning trials [28]. A GWAS study on educational attainment in humans localised a disproportionate number of SNPs in regions that regulate brain-specific gene expression or regions associated with histones marked in the central nervous system [29]. An animal’s behavioural repertoire is influenced by the existing neural architecture and its physiology, which can be strongly affected by changes in regulatory regions and translational repressive mechanisms. Such repressive factors are also found to be relevant during memory formation [30] and can include translation initiation or elongation factors [31, 32] and microRNAs [33]. When studying complex behavioural traits like learning ability and memory retention, it is therefore insightful to consider both gene expression differences and changes in gene allele frequency in parallel. ‘Evolve-and-resequence’ experiments are a powerful way of detecting allele frequency changes in response to selection on various traits [34,35,36,37,38]. Here we use this approach to screen both the genome and the transcriptome of a parasitoid wasp for loci that respond to selection on learning ability.

The parasitoid wasp Nasonia vitripennis has become a model for evolutionary genetics approaches since it combines convenient genetic features with an interesting biology and behavioural repertoire [39]. The females are very sensitive to cues that help them find concealed hosts in a complex environment, making these wasps excellent subjects for studying associative learning ability. QTLs for memory-related phenotypes were identified by introgressing two related Nasonia species [40]. Also, differences in gene expression induced by a learning experience were found between the same two species, resulting in a list of candidate genes that may regulate long-term memory formation [40]. These studies benefit from using isofemale lines with limited genetic variation, which increases the contrast of a potential genetic difference between the different lines. However, such studies do not reflect situations under natural conditions where polygenic adaptation, especially at the early stages of selection, draws on standing genetic variation in populations [41, 42].

In this study, we exploit variation segregating in a natural population to identify the genetic and transcriptional basis of associative learning ability. Specifically, we assessed how selection on associative learning and memory formation impacts gene allele frequency and gene expression. In an earlier study, replicate lines of N. vitripennis were selected for rapid associative learning ability in associating a colour with host presence [8]. Associative learning ability responded within 10 generations of selection and proved to also extend to associative learning of other cues and rewards besides the visual associative learning task that was selected for [8]. We jointly analysed genome-wide allele frequencies and gene expression levels in head tissue of adults of the replicated, evolved populations. If genes involved in learning ability are under cis-regulation, we might expect to find significantly diverged SNPs in the regulatory regions of genes that show differential expression. Alternatively, we may expect that sequence evolution and regulatory changes do not involve the same genomic regions, but affect genes that are functionally related or part of the same interaction network. We therefore constructed a gene network using the genes identified by our genomic and transcriptomic analyses. This integrative approach reveals how a complex trait like associative learning ability can readily respond to selection via a complex interplay between regulatory and sequence evolution.

Results

Selection for associative learning ability

Experimental procedures and phenotypic responses to selection for each pair of selected and control lines are summarized in Fig 1. Associative learning ability responded swiftly to selection: within 10 generations of selection, the response in forming learned associations was significantly increased in the four selected lines [8]. The baseline associative learning ability of the four control lines showed no consistent in- or decrease.

Fig. 1
figure 1

Artificial selection on associative learning ability in N. vitripennis.a Schematic overview of the selection experiment. Females were allowed to oviposit on host pupae placed on either a yellow or blue background. 24 h later, the females were placed in a transparent tube with yellow and blue outer ends. Only females that chose the background colour corresponding to the colour on which they had been trained were used to found the next generation. This procedure was repeated ten generations in four replicate selected lines. Each selected line was paired to a control line that was treated following the same procedure, except that in the control lines, females were randomly chosen from the tested wasps. Nasonia’s after a photo by Jitte Groothuis. b Changes in associative learning ability (mean performance index over four replicate lines) over ten generations of selection in selected and control lines. For details, see [8]

Genome sequencing statistics

To identify the genetic variants contributing to evolved differences in associative learning ability, we performed pooled whole-genome sequencing on four selected and four control N. vitripennis lines. We obtained on average 77,796,493 reads per library, totaling 622,371,946 genomic read pairs across the eight samples. The average mapping rate to the reference genome was 92%. Mean coverage per sample ranged between 25x and 139x. Sequencing statistics for the eight genomic DNA libraries are summarized in Additional file 1.

Allelic divergence

A total of 2.6 M variant positions passed a series of stringent filters (see Methods). Patterns of allelic divergence differed between pairs of selected and control lines (Additional file 2). Most of these allele frequency differences are likely to be due to genetic drift in each of the lines. Only variants that differed consistently between the four selected and four control lines are likely to be due to the selection regime. While there may also have been line-specific responses to selection, such effects would be indistinguishable from genetic drift. We therefore used a generalized linear mixed model to identify variants that had consistently diverged in response to selection for associative learning ability, while taking into account differences in sequencing depth and idiosyncratic differences in allele frequency between replicates [35]. After adjusting for multiple testing, we identified 118 variants as significantly diverged between selected and control lines (Additional file 3). These variants were distributed over 12 genomic scaffolds, with 1 to 46 significant variants per scaffold and clustered in 51 distinct genomic regions separated by 60 kb or more. Three scaffolds contained 19 or more significant variants (Fig. 2). FST values were elevated in regions containing clusters of consistently diverged SNPs (Additional file 2).

Fig. 2
figure 2

Genome-wide association study of associative learning ability. Manhattan plots for selection treatment in the full dataset (including all four replicate pairs). The –log10P values are plotted against the position on each genomic scaffold. Red dots indicate variants for which the FDR-adjusted P < 5e-6. a Chromosome 1 (scaffold NC_015867.2) (b) Chromosome 2 (scaffold NC_015869.2) (c) Unplaced scaffold NW_001820749.1

Effect prediction

We determined which genes were potentially affected by the 118 consistently diverged SNPs by selecting all genes within 10 kb of each of these SNPs according to the official annotation of the N. vitripennis genome. In addition to exons and introns, we included the regions 10 kb up- and downstream of each gene, as these may include important regulatory sites. A total of 128 genes were located within 10 kb of a consistently diverged SNP (Additional file 4). We retrieved gene ontology (GO) terms associated with each gene. If our selection regime primarily affected genes with cognitive functions, GO terms associated with such function would be overrepresented among our gene set. No GO terms were overrepresented among genes associated with consistently diverged SNPs, indicating that a substantial number of consistently diverged variants had likely diverged as a result of hitchhiking, due to linkage to loci under selection.

Similarly, we did not expect the 128 genes of interest to show an overall bias towards expression in any particular tissue because of hitchhiking. However, we expected that genes involved in associative learning ability would be most strongly expressed in head (brain) tissue. We used data available in WaspAtlas [43] to assess in which tissues these 128 genes are known to be (primarily) expressed, comparing relative expression values in the available tissue-specific expression data (female and male head, female whole-body and male testis). The results (Fig. 3a) showed that 30 of the 128 genes in the vicinity of diverged SNPs had increased expression in female head tissue. The genes with the most strongly increased expression in female heads included: LCCH3, serine protease 133, putative polypeptide N-acetylgalactosaminyltransferase 9, lachesin, prickle-like protein 3 and prominin-1. By contrast, a small group of genes were conspicuously higher expressed in testis tissue (Fig. 3a). We expect that chance associations are the cause of this pattern, due to hitchhiking of many significantly diverged genes with a small number of brain-related genes that responded to selection. Alternatively, by selecting only females during the selection experiment, we may have skewed allele frequency of sexually antagonistic genes.

Fig. 3
figure 3

Tissue- and life stage-specific expression of candidate genes. Heatmap of expression patterns of the candidate genes identified in the genomic analysis in different (a) tissues (log transformed) and (b) life stages. Source: WaspAtlas

Another possible explanation for the paucity of genes with relatively high expression in female head tissue among genes near diverged SNPs is that important genes involved in brain development may act in earlier developmental stages, in particular in the pupal stage. To investigate this, we compared the levels of expression of genes near significantly diverged SNPs in ovaries (eggs), early and late embryos, larva, pupae and adults using data available in WaspAtlas. The results (Fig. 3b) show clusters of genes that are either highly or lowly expressed during a particular developmental stage across all tissues. About a third of the candidate genes show stage-specific expression (Fig. 3b). Genes showing highest expression in pupae, but low expression in embryonic, larval and adult stages included low-density lipoprotein receptor-related protein 6-like, putative odorant binding protein 62, and two uncharacterized proteins.

Most diverged SNPs (62.7%) were located in introns. Smaller numbers of diverged SNPs were located outside genes (28.8%) or in coding regions (8.5%). For eight genes we found significantly diverged SNPs in their coding regions (Table 1, Additional file 5). Four SNPs resulted in amino acid changes in genes encoding pyrokinin, turripeptide Pal9.2like, kinectin and an uncharacterized protein (Table 1). A further five SNPs in coding regions did not change the amino acid sequence (Table 1).

Table 1 Genes carrying consistently diverged SNPs in their protein-coding regions

Transcriptome sequencing statistics

To compare gene-specific expression levels between the eight replicate N. vitripennis lines (four selected and four control lines), we sequenced mRNA extracted from female heads. We obtained on average 101 × 106 raw PE 100 bp reads per library of which we retained 95% high quality reads on average. Removing transcripts with no or low expression yielded a total of 17,007 expressed transcripts, out of a total of 26,079 in the reference transcriptome. A complete overview of the sequencing information is presented in Additional file 6.

Gene expression

We used three complementary analyses to identify transcripts that showed significant and large changes in expression, consistent across the four pairs of selected and control lines. Differential expression analysis in edgeR yielded 21 significant genes, pairwise χ2 tests yielded 23 significant genes and principal component analysis yielded 17 significant genes (see Methods for details, Additional files 7, 8, 9 and 10). Combining these sets of transcripts and removing those with less than two-fold difference in expression between control and selected lines resulted in 36 significant genes (Additional file 10). Rank-rank hypergeometric overlap (RRHO) analysis showed correspondence between our differential expression results and those obtained by [40], who compared gene expression in trained and untrained N. vitripennis (Fig. 4a). Specifically, there was overrepresentation of transcripts that were either up- or downregulated in both our selected lines and in trained wasps (Fig. 4a). Surprisingly, none of these transcripts matched genes identified as having diverged in allele frequency between selected and control lines in the genomic analysis. RRHO analysis similarly showed no overlap between the lists of differentially expressed genes and those showing differences in allele frequency (Fig. 4b). However, when we examined pairs of selected and control lines separately, we did observe weak, but significant correlations between differential gene expression and FST for three of the four pairs (Additional file 11).

Fig. 4
figure 4

Rank-rank hypergeometric overlap (RRHO) analysis, comparing the differential gene expression results to (a) results obtained by [40], who compared gene expression between trained and untrained wasps, and (b) allele frequency difference (ranked by P). Overrepresentation of transcripts in the lower left and upper right of (a) indicate correspondence between the two datasets at the top and bottom of the sorted gene expression lists. No such overlap is seen in (b), where the only overrepresentation occurs at low FST and non-differential gene expression

For 14 transcripts, expression was elevated in the selected lines while the remaining 22 transcripts showed reduced expression (Fig. 5, Table 2). Neither of these sets of transcripts was enriched for any GO term. Interestingly, seven transcripts showed completely eliminated expression as a result of selection in at least three of the four replicates, while for seven other transcripts the reverse was the case: from zero expression in the control lines to high expression in the selected lines (Additional file 12). Thus, for 14 out of 36 significant transcripts, the evolved increase in associative learning ability was associated with a near-complete on or off switching of expression. Finally, two genes each expressed two alternative splice variants that showed complete opposite patterns of expression in response to selection, i.e. one transcript went from low to high expression upon selection while the other transcript of the same gene went from high to low expression. The first gene, Rho GTPase-activating protein, showed an 8-fold reduction in expression of one of the splice variants, the greatest among all transcripts, while the other transcript had virtually zero expression in the control lines but increased to the expression level of the alternative splice variant upon selection. The other gene, cap-specific mRNA (nucleoside-2’-O-)-methyltransferase 2, displayed a similar pattern (Additional file 13).

Fig. 5
figure 5

Evolved expression changes in response to selection for increased associative learning ability. Heatmap shows log2 Fold Change between control and selected lines for four replicated pairs of lines (in columns), and 36 significant transcripts in rows (sorted from low to high Fold Change). Positive Fold Change values (in green) represent increased expression in selected compared to control lines, and negative values (in red) represent reduced expression. See Table 2 and Additional file 10for details and statistics

Table 2 Evolved expression changes in response to selection for enhanced associative learning ability. Thirty-six transcripts, expressed from 34 loci, showed evidence of evolved expression regulation that was consistent across four replicate pairs of lines, as identified by three complementary methods (see Methods and Additional file 8). Transcripts are in order of increasing Fold Change, with high Fold Change indicating elevated expression in selected lines

Gene network analysis

To explore potential functional relationships between the candidate genes identified by our genomic and transcriptomic analyses, we constructed a gene network for all 113 Drosophila melanogaster orthologs of our candidate genes (Additional files 4 and 6) in GeneMania [44]. This resulted in a co-expression network that included 77 genes with diverged SNPs and all 27 differentially expressed genes for which a Drosophila ortholog was known (Fig. 6a). The co-expression network consisted of 638 edges and contained two sub-networks of genes with shared protein domains (13 and 7 genes per clusters; Fig. 6b). Furthermore, several of the genes in the network code for proteins that have been shown to physically interact with the proteins of several other genes in the network. These genes may constitute putative hub genes in the network and included ATPase TER94.

Fig. 6
figure 6

Gene network for D. melanogaster orthologs of genes that responded to selection for increased associative learning ability in N. vitripennis. Nodes represent genes and edges are based on co-expression (center). Two subnetworks where edges represent shared protein domains are shown on the left. Three putative hub genes that each have physical interactions with > 6 other genes within the network are indicated on the right. Genes with significantly diverged SNPs are indicated in yellow circles, differentially expressed genes in orange squares and highly connected genes added by GeneMania in grey

Discussion

We identified 118 genomic loci that showed consistent differences in allele frequency between replicated selected and control lines. A total of 128 genes were located in close proximity to significantly diverged loci. Transcriptome analysis of the same selected and control lines identified 36 transcripts as consistently differentially expressed. None of these matched the significantly diverged genomic loci. However, network analysis showed that many of the genes near diverged loci clustered with the differentially expressed genes in a single co-expression network.

Four of the significantly diverged SNPs at the 118 candidate genomic loci changed the amino acid sequence of the genes product. These SNPs resulted in missense mutations in the genes pyrokinin (involved in neuropeptide signaling pathway), turripeptide Pal9.2like (an ion channel inhibitor and neurotoxin), kinectin (transmembrane protein interacting with RhoGTPases) and an uncharacterized protein. A further five diverged SNPs in coding regions did not change the amino acid sequence. This included a synonymous SNP in the gene clarin− 3, which encodes a transmembrane protein expressed in cochlear hair cells and neural retina and which is associated with cognitive performance in Parkinson’s disease in human GWAS [45]. The remaining SNPs were located in introns or outside protein-coding regions. Some of these were in or near genes with known functions in insect learning (see [46] for an overview) and may have affected their expression. These included genes encoding serine protein kinases, gated channels, cytoskeleton components (actin), a splicing factor, several neuropeptides and several odorant-binding proteins. In addition, candidate SNPs were found near several histone-related genes and at least 1 methyltransferase, pointing to epigenetic regulation. One SNP was located downstream of an ortholog of latheo, of which knock-down mutants in D. melanogaster show learning and memory deficits [47].

The 36 differentially expressed genes likewise included genes encoding a serine protease, a transporter gene, a kv channel and a cap-specific methyltransferase. Especially interesting is a differentially expressed gene encoding a RhoGTPase, as one of the genes carrying a missense candidate SNP identified in the genomic analysis encoded a kinectin, which is a transmembrane protein that interacts with RhoGTPases. RhoGTPases are key regulators of dendritic morphology that integrate environmental cues, such as neuronal activity [48].

None of the differentially expressed genes matched those identified in the genomic analysis, although differential expression and allelic divergence were weakly correlated at the level of pairs of selected and control lines. A number of factors may account for this observation. First, some of the diverged genes may function in early development, rather than in adults, which was the only life stage we sampled for the transcriptome analysis. Several diverged genes did indeed show strongest expression during the pupal stage (WaspAtlas). This included low-density lipoprotein receptor-related protein 6-like, the mammalian ortholog of which is critical for synaptic function and cognition [49]. In humans, variants in this gene have been linked to Alzheimer’s disease risk [49]. Second, some of the diverged loci may affect protein structure and function, rather than transcription. Last, the candidate genes showing evolved expression differences may all be regulated in more complex, indirect ways. Gene network analysis revealed all differentially expressed genes to be co-expressed with 77 of the genes with diverged SNPs, at least in Drosophila. Furthermore, three genes were known to interact physically with many other genes in the network. This included the ATPase TER94, which is involved in dendrite morphogenesis and neuron apoptosis in D. melanogaster [50]. Further work is required to distinguish between these possibilities.

Evolved differences in gene expression associated with improved associative learning, as studied here, at least partially targeted existing regulatory pathways of the process of learning and formation of memory per se. The list of most strongly up- or downregulated genes showed overlap with that obtained by an earlier study [40], that compared the head transcriptomes of N. vitripennis with and without a learning experience. This suggests that the evolutionary response to selection in terms of gene expression attenuated the gene expression differences induced by learning in unselected wasps. This effect was only detected using sensitive RRHO analysis, but not by comparing only the lists of differentially expressed genes. Similarly, [46] identified genes that were differentially expressed between conditioned and unconditioned individuals of two other species of parasitoid wasp and found that even for different types of memory formation (e.g. short- versus long-term memory formation) within one species, the overlap in differentially expressed genes was only 9% [46]. The diffuseness of overlap between transcriptomic studies of different aspects of learning ability points to the highly specific nature of gene expression involved in different processes of learning and memory.

Conclusions

In conclusion, both the genomes and the transcriptomes of the selected lines changed in response to selection on associative learning ability, however there was no overlap between the two gene sets. Further studies are required to assess the generality of this finding. A recent study on the adaptive response to toxicity in populations of killifish found genomic and transcriptomic changes to affect the same pathways [51]. On the other hand, artificial selection on resistance against parasitoid attack in D. melanogaster yielded allelic changes at many loci [34], but no differential expression in any of these loci after parasitoid attack [52], and only very limited overlap with genes that changed in expression during egg-larval development in the selected lines [53]. In our study, we found many of the diverged gene loci to cluster in the expression network with the differentially expressed genes. We therefore conclude that selection acted on loci that affected the phenotype through complex mechanisms, including gene interactions and gene regulation at multiple loci.

Methods

Study organism & selection experiment

Nasonia vitripennis (Hymenoptera: Chalcidoidea) is a generalist gregarious wasp that parasitizes pupae of large dipterans (e.g. Calliphoridae) [54]. The founding N. vitripennis population for our selected and control lines was the HVRx strain which was originally established from females collected at five field sites in the Netherlands and is maintained with relatively high genetic variance [55]. Analysis of 40 microsatellite loci showed that the effective population size is kept at Ne = 173, expected heterozygosity was He = 0.56 and allelic richness was around 3.4 [55]. Genomic SNP density per 100 kb window averaged 0.13% [55]. At the start of the selection experiment females were randomly assigned to four pairs of selected and control lines (A, B, C, and D). Environmental factors like season, air pressure and (seasonal) host quality can drastically influence behaviour and motivation, introducing substantial variation in the learning response between generations. To control for this, sets of selected and control lines were always conditioned and tested on the same day [8]. We selected for rapid associative learning and memory retention in female wasps using appetitive conditioning to associate a colour with a host reward. Each generation, the females and males that emerged were kept together for 2–3 days for mating and maturation, after which ca. 160 mated females per line were conditioned (Fig. 1a). To avoid increasing sensitivity to a particular colour, half of the wasps were conditioned on the colour blue and the other half on the colour yellow at each generation. When given a choice between the reinforced colour and a neutral colour in a T-maze 24 h later, wasps that learned the association between colour and reward will move towards the conditioned colour. As a measure of this learned preference we calculate a ‘performance index’ based on the distribution of wasps in the T-maze between the two reciprocal groups (conditioned on yellow or blue). This index can be expressed as a percentage and ranges from 0 (no learned preference) to 100 (perfect learned preference) and controls for innate colour preferences [8]. Only the wasps that demonstrated a learned preference for the conditioned colour were chosen for the selected lines (Fig. 1a), while in the control lines wasps were chosen randomly. Each generation, 50 mated females per line were allowed to reproduce. The experiment was continued for 10 generations under continuous selection, the results and details of this experiment are described in [8]. Although learned preferences showed considerable variation between lines and generations, the mean learning response (performance index) of the selected lines increased over time, while that of the control lines did not (Fig. 1b; [8]). No detrimental effects of the selection regime were observed; we detected no difference in relative brain volume, lipid content or body size between selected and control lines and a minor difference in longevity. Innate colour preference of the wasps remained constant throughout the experiment [8].

DNA extraction, sequencing and alignment

Pools of 40 females per selected and control line were collected 24 h after being conditioned on a colour, as during the selection experiment. DNA was extracted from these pools using the following protocol. Groups of ten females were rinsed in 70% ethanol, vacuum dried and crushed in 200 μl phosphate buffered saline. After adding 200 μl Nuclei Lysis Solution, 5 μl RNAse (4 mg/ml, Promega®) and 4 μl Proteinase K (20 mg/μl, Roche®), the samples were incubated for 15 min at 60 °C. A further 340 μl DNA Lysis buffer (Promega®) was added and after a spin down (10 min. at full speed) the supernatant was collected on a spin column (Promega®). After 3 wash steps the DNA was eluted in 100 μl H2O.

Length and integrity of the DNA molecules were checked on a 1% agarose gel. Concentrations of DNA and RNA were measured on a Qubit® device and purity was assessed using Nanodrop. Equal amounts from the four DNA isolations per line were pooled, creating eight DNA samples for sequencing.

Each pooled DNA sample was fragmented to 300 bp using a Covaris sonicator. DNA fragments were size selected and Illumina libraries were constructed at the Leiden Genome Technology Center at the Leiden University Medical Center (Leiden, The Netherlands). All samples were sequenced on an Illumina HiSeq2000 (100 bp paired-end). Additional sequence data was generated for one sample (selected line D) on an Illumina NextSeq500 (125 bp paired-end). Initial quality checks of the raw reads were conducted using SGA preprocess [56] and fastqc [57]. Reads were mapped to the N. vitripennis reference genome (nvit_2.1) using Bowtie2 [58] using default parameters. Duplicate reads were removed using PicardTools (https://broadinstitute.github.io/picard/) and indel realignment was conducted using GATK [59].

Genotyping

A single mpileup file was drawn from the bam files for the eight selected and control lines using Samtools [60]. This was converted to a sync file as input for Popoolation2 [61] using the script mpileup2sync.jar with –min-qual 20. The sync file was then converted to a vcf file using a custom python script. A total of 72 M variant positions were initially identified as biallelic. We selected only variant positions for which the total coverage was > 10 for all selected and control lines and the nonreference allele had a coverage > 5 for at least one of the lines. An additional 11 M variant positions were identified as triallelic. Most (89.1%) of these loci were actually biallelic, with either both alleles different from the reference or a third ‘allele’ with very low coverage that was probably a sequencing error. We selected loci that had an overall coverage > 10 for all selected and control lines and for which two (but not three) alleles had a coverage > 5 in at least one of the selection lines. A total of 2.6 M variant positions passed these criteria.

F ST

To visualize genetic divergence between each selected line and its paired control line, we calculated FST in 10 kb windows using Popoolation2. The original sync file (see under genotyping) was used and the maximum coverage was set at 2%.

Generalized linear mixed model

To identify consistent allele frequency differences between selected and control lines while controlling for the extreme variance in sequencing depth among the lines, we implemented a GLMM using the R package lme4 following [35]. Read counts for the two alleles at each variant position were the response variable in the model and errors were assumed to follow a binomial distribution. Selection regime (selected / control) was specified as a fixed effect and line (one of eight; selected A, control A, etc.) as a random factor. P values for each variant position were obtained using Wald tests and were FDR-adjusted for multiple testing using the method of [62]. We considered any variant with an adjusted P value below 5e-6 statistically significant. The potential effects on nearby genes were annotated using SnpEff [63]. GO enrichment was assessed using the overrepresentation module in WaspAtlas [43]. The level of expression of each gene during different developmental stages and tissues was investigated using data available in WaspAtlas. Five significantly diverged SNPs in coding regions were validated using Sanger sequencing. The same pooled DNA samples that were used for Illumina sequencing were used as templates. Allele frequencies of the focal SNPs were estimated from the height of each of the two overlapping peaks in Vector NTI v.11.0.

RNA extraction, sequencing and alignment

For each of the four selected and four control lines, we dissected heads from 15 individual, 4-day old, adult females that had been collected 24 h after being conditioned on a colour. All heads were snap frozen in liquid nitrogen and stored at − 80 °C until RNA isolation, pooling all heads from the same line into a single sample. RNA was extracted using the RNeasy® Micro kit of Qiagen®, following the manufacturer’s protocol “Purification of Total RNA from Animal and Human Tissues” and eluted in 2 times 14 μl H2O. RNA concentration and purity was assessed using Qubit. mRNA size selection, reverse transcription, paired-end library preparation and sequencing were performed in-house at Macrogen. The eight paired-end libraries of 2 × 100 bp (average insert size 186 bp) were sequenced across three lanes on an Illumina HiSeq 2500. Read quality using FastQC v. 0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Trimmomatic v. 0.3 [64] was used to trim and remove low quality reads. We removed reads containing adaptor sequence, and removed bases if the Phred score over a four base sliding window was below 15, or if the leading or trailing bases had a score below 3. We dropped read pairs entirely if either read pair after trimming was less than 36 bases. We used RSEM [65] 1.2.26 with Bowtie2 [58] v. 2.2.6, both with standard parameters to map the reads to the reference transcriptome (GCF_000002325.3_Nvit_2.1_rna.fna from NCBI, annotation version 101) and quantify expression (i.e. read abundance) at transcript level. This version of the reference transcriptome was constructed with the aid of RNAseq data from several different tissues, including adult heads [66]. Next, we filtered the resulting count data by minimum expression, only retaining transcripts whose expression was at least one Count Per Million (CPM) in at least one of the eight libraries.

Differential expression analysis

We aimed to identify transcripts showing expression responses to the selective environment that were concordant across the four replicate line pairs. Given the strong line-by-selection interactions (see principal component analysis below), which with our experimental design could not be estimated directly, we employed three complementary analysis methods. To reduce the number of false positives, we combined these methods with strict filtering criteria. First, we used standard general linear model differential expression analysis, as implemented in edgeR to model untransformed raw count data for each transcript separately as a function of selection treatment and line, but not their interaction, given that our design did not allow for the independent estimation of the line-by-selection interaction. We tested the effect of the selective environment by performing likelihood ratio tests between models with and without the selection treatment effect, yielding a likelihood ratio and corresponding P value, which we corrected for multiple comparisons using Benjamini and Hochberg’s False Discovery Rate [62] (FDR), accepting an FDR of 0.05. We only retained transcripts that showed concordant expression changes across all four line pairs, and had an absolute Fold Change > 2, resulted in 21 significant genes. Second, we tested independently evolved expression differences within each line pair using a series of pairwise χ2 tests to compare untransformed, raw counts between the selected and control library of each line pair (each a pool of 15 individuals). The subtotals across all genes for the selected and control library were used to compute expected counts (under the null hypothesis of no differential expression). This yielded a χ2 and corresponding P value for each transcript, and we retained only transcripts with FDR < 0.05 in all four comparisons, concordant expression changes in all four pairs, and an absolute Fold Change > 2. This resulted in 23 significant transcripts, eight of which were unique to this method. These eight genes were missed by edgeR, possibly due to confounding interaction effects. Finally, to explore the variance structure in the expression data, we performed a principal components analysis (PCA) using the function prcomp in R [67] (v.3.3.2) on the expression data, first normalised to CPM using Trimmed Methods of Mean (TMM) as implemented in edgeR [68] v. 3.14.0, and log transformed. This analysis revealed strong effects of selection line on transcriptional variation, both independent of, and in interaction with the selective regime. PC 1 separated line pairs A and B from C and D, independent of selective regime, and PC 2, 4, and 5 separated selected from control lines for some replicates, but not for all four. Only PC 6 consistently separated selected from control lines in all four replicates, which was the pattern we were most interested in (Additional file 7). We then used this PC to find consistently evolved expression changes. For each gene, we computed a Pearson’s correlation between its normalized expression and the loading of that gene along PC 6. Given our low power to detect significant correlations with only eight data points, we did not correct the resulting P values for multiple comparisons, but instead kept the transcripts with P < 0.05. We filtered again by concordance in expression across the four replicate line pairs, as well as by Fold Change > 2. This final method yielded 17 significant transcripts, four of which were unique to this method. See Additional files 7,8, 9 and 10 for an overview of the significant transcripts discovered by each method.

We extracted transcript names, gene names and genomic location for each transcript using its accession in the reference transcriptome, and the reference gff3 annotation file at NCBI, both for Nvit2.1, annotation 101 (https://www.ncbi.nlm.nih.gov/assembly/GCF_000002325.3/), supplemented with additional annotation information from WaspAtlas [43]. We used rank-rank-hypergeometric overlap analysis [69], as implemented on http://systems.crump.ucla.edu/rankrank/rankranksimple.php, to compare the degree of differential gene expression between selected and control lines to differential gene expression obtained by [40] and allele frequency differences. This algorithm steps through two gene lists ranked by either the degree of differential expression or allele frequency difference (P from GLMM), successively measuring the statistical significance of the number of overlapping genes [69]. To further assess the correspondence between the differential expression and allele frequency differences, we compared these values for each pair of selected and control lines using Pearson’s χ2 tests.