Introduction

One of the central goals of evolutionary biology is to understand how organisms cope with environmental heterogeneity (Savolainen et al. 2013). Phenotypic plasticity can solve this problem but the phenotypic response to variation in an environmental variable (i.e. phenotypic plasticity) may vary among species as well as among genotypes of the same species. This is commonly referred to as a genotype-by-environment interaction (G × E, Falconer 1990; Lynch and Walsh 1998; Via and Lande 1985). Abundant experimental data and theoretical work now provide strong support for the idea that the relative performance of different genotypes across environmental gradients may be involved in the maintenance of phenotypic plasticity and genetic variation as well as the evolution of fitness related phenotypes (Carreira et al. 2006; Ungerer et al. 2003).

The study of environmental heterogeneity has been championed by work on geographic patterns along clines, with classic examples coming from genetic clines in flies almost a century ago (Dobzhansky 1948; Sturtevant 1921). Other examples include the genetic underpinnings of traits involved in adaptation to high altitude (Storz 2002) and pigmentation (Hoekstra and Coyne 2007). Clines hold a large attraction as natural Darwinian laboratories because they allow researchers to quantify the gradual effects of changing environments on genotype, phenotype, and the interaction between them (Endler 1977). These studies have shown that while many environmental tolerance traits that vary clinally are quantitative and polygenic (Wellenreuther and Hansson 2016), they can also be located within chromosomal segments that segregate as a single locus (Tigano and Friesen 2016). If environmentally adaptive variants are chromosomally linked, either because they are found on chromosomal inversions or are in areas of low recombination, their selective coefficients act in an additive manner and they may be less likely to be lost via genetic drift and more likely to contribute to rapid adaptation (Kirkpatrick and Barton 2006). In the extreme case, complex adaptations may be accomplished through the fixation of linked sets of adaptive variants (Yeaman 2013), often called supergenes, a case in which polygenic adaptation may be more accurately modeled with single-locus models (Schwander et al. 2014). Supergenes are often found to be chromosomal inversions. Good examples come from Drosophila melanogaster where inversion polymorphisms produce latitudinally varying phenotypes for traits including heat and cold tolerance and body size (Reinhardt et al. 2014; Schrider et al. 2016) and wing mimicry polymorphisms found in Heliconius butterflies (Thompson and Jiggins 2014).

Chromosomal inversion polymorphisms are being identified at a fast rate from an increasing number of populations of organisms, e.g. insects, plants, bacteria and humans, and are commonly linked to trait and fitness variation (Hoffmann and Rieseberg 2008; Kirkpatrick and Barton 2006). Inversions are particularly common in diptera. The seaweed fly Coelopa frigida (family Coelopidae) has 5 chromosomes with inversions occuring on 3 of them (Aziz 1975). The largest of these is an α/β inversion polymorphism on chromosome I encompassing about 10% of the genome and containing ~ 100 genes (Aziz 1975). This inversion polymorphism is known to affect a wide range of life history and reproductive traits, such as development time (Day and Buckley 1980), body size (Butlin et al. 1982), longevity and female fecundity (Butlin and Day 1984), female resistance to male mating attempts (Gilburn and Day 1994; Gilburn et al. 1992) and also a male’s ability to overcome female resistance (Butlin et al. 1982).

The most striking of these phenotypic traits is a three-fold size difference in males: male homokaryotypes for the inversion type α are much larger (~ 9 mm) than β- homokaryotypes (~ 3 mm), whereas the heterokaryotype αβ is intermediate in size (~ 6 mm) (Fig. 1b, Wellenreuther pers. obs., Butlin and Day 1984). Female size is not significantly affected by karyotype (Butlin and Day 1984). Maintenance of the polymorphism is secured by strong heterosis (Collins 1978), since αβ-heterokaryotypes have higher fitness than either homokaryotype (Butlin and Day 1984, 1989). The heterokaryotype αβ has a higher viability during development than either homokaryotype, both in experiments (Butlin and Day 1984; Leggett et al. 1996) and in the field, as indicated by heterokaryotype excess (Butlin et al. 1982; Day et al. 1983, 1982b; Gilburn and Day 1994). Heterokaryotype excess increases under interspecific competition with the congener C. pilipes and under intense intraspecific competition (Butlin and Day 1984; Leggett et al. 1996). The effect is disproportionally strong in males, especially decreasing the frequency of αα-homokaryotypes (Butlin and Day 1984), possibly indicating lower viability of males of this karyotype.

Fig. 1
figure 1

a Map of C. frigida fly populations used in this study showing the frequency of α (data from Day et al. 1983). Base map from d-maps (http://d-maps.com/carte.php?num_car=23102). b Picture of an αα, αβ, and ββ male showcasing the size differential

Coelopa frigida belongs to the group of acalyptrate flies which exclusively forage on decomposing seaweed (Cullen et al. 1987) and is found along the seashores of Scandinavia (Mcalpine 1991). The α/β inversion occurs at stable frequencies in all populations so far studied in Northern Europe (Butlin et al. 1982) and North America, although there may be seasonal (Butlin 1983) and clinal variation (Day et al. 1983). However, inversion frequencies also form a cline from the North Sea to the Baltic Sea, with the α inversion becoming more rare towards the Baltic Sea (Day et al. 1983). Such clines are classic signs of selection and this cline is associated with a decrease in salinity and tidal range and associated changes in the composition of the wrackbed (i.e. the species of algae present). As salinity decreases from the North Sea to the Skagerrak and Kattegat regions there is a corresponding decrease of marine species of brown seaweeds such as Fucus spp. and Laminaria spp. and moving into the Baltic Sea, Zostera eelgrass and red algae gradually displace the brown seaweeds (Table 1).

Table 1 Spatial coordinates (WGS 1984 DD), sea surface salinity (PSU), mean air temperature, inversion frequency, and wrackbed composition of the study sites

Here we explore if the environment modulates inversion frequencies and phenotypic effects in C. frigida. For this, we study C. frigida populations in Scandinavia that show variation in karyotype frequencies (Day et al. 1983). Specifically, we test the possible effects of the α/β inversion polymorphism on body traits and development time across four suitable natural breeding substrates using a fully balanced reciprocal transplant experiment. We also elucidate the relative fitness of the inversion karyotypes on these varying substrates by examining the frequencies of the three karyotypes in different environments.

Materials and methods

Study species and sites

Inversion frequencies of C. frigida form a cline from the North Sea to the Baltic Sea, with the α inversion becoming more rare towards the Baltic Sea (Day et al. 1983). We selected four populations (Kämpinge, Kåseberga, Mölle and Steninge) from the southern part of the cline that differ in tidal range, salinity, and wrackbed composition (Fig. 1b, Table 1). We extracted climate data (sea surface salinity and air temperature) for each site from MARSPEC (Sbrocco and Barber 2013) and WorldClim databases using the R package raster (Hijmans and van Etten 2014) (Table 1).

Reciprocal transplant experiment

We conducted a fully balanced reciprocal transplant experiment involving all four locations, where larvae from all locations were allowed to grow on their own wrackbed substrate and the substrates of the remaining three locations. For each location, fly larvae and wrackbed substrate were collected between February and May 2014. The wrackbed substrate was placed in labelled plastic bags and then frozen to ensure that no viable eggs or larvae remained. Fly larvae were reared on 150 g of their own substrate in aerated plastic containers until a puparium was formed, and then kept in individual Eppendorf tubes (to ensure virginity) containing a small piece of cotton soaked in 5% sucrose solution. When a fly eclosed, the Eppendorf was moved to the refrigerator to slow down aging. Coleopa frigida is exceptionally cold tolerant and this treatment does not appear to affect survival (E. Berdan, pers. obs.). Over a maximum of ten days, adult flies were collected until a minimum of 25 couples could be set up per location. Eggs were collected from 14 to 15 mating pairs per population and subsequently mixed to populate three replicates of each population × substrate combination with 30 eggs each (4 populations × 4 substrates × 3 replicates, each one with 30 eggs, total eggs for the experiment: 1440). Replicate pots were kept at 25° C with a 12/12 h light:dark cycle and each pot (190 × 110 × 70 mm) was aerated and contained 150 g of seaweed. Pots were checked daily and all eclosing flies were sexed and then stored at − 20° C on the same day.

Morphological measurements

Morphological measurements were carried out on adult flies only. All flies were weighed (Sartorius Quintix 124-1S microbalance) to the nearest 0.0001 g. The thorax width of each fly was measured with a digital calliper (Cocraft Digital Vemier Caliper) at the widest place and to the nearest 0.01 mm. This measurement was repeated twice and the average was used. Then, the left wing and right middle leg were removed (if one side was damaged the other side was used). The wing and leg were placed under a thin glass slide and placed under the microscope and a picture was the taken using a Nikon Digital Sight DS-Fi2 mounted on a Nikon SM2800 microscope set to a magnification of two power. For the wing, we measured the distance from the prominent supra-alar bristle to the posterior margin of the wing length. For estimating leg length, we measured the full length of the middle leg. The resulting images were all labelled with an individual identifier. Measurements were conducted blind to location and substrate type to avoid bias.

Molecular determination of inversion types

In C. frigida the two most common alcohol dehydrogenase (Adh) allozymes, Adh-B and Adh-D are in complete linkage disequilibrium with the α and β forms of the inversion (Day et al. 1982a). Gel electrophoresis was used to visualise the alcohol-dehydrogenase allozymes associated with the two inversion types and this allowed for straightforward genotyping of the inversion types. Starch gels (12%) were prepared by combining 24.5 g potato starch (Sigma-Aldrich, S4251) with 90 ml TEB-buffer and 90 ml dH2O, heating the mixture, and then pouring it onto an 18.5x10 cm glass plate with a plastic frame and left to set for 30 min, after which it was put in a 4OC refrigerator overnight.

Porcelain plates with 12 wells were kept on ice packs. On each well, a fly was placed together with a small amount of carborundum powder (Fisher Chemical, #C192-500) and one drop of cold dH2O. The flies were then homogenized using a glass rod. Once homogenized, filter paper wicks were placed in the wells to soak up the homogenate. The wicks were then gently inserted into the starch gel.

Starch gels were put in the electrophoresis chamber, which was connected to a power supply (Consort E863). The two chambers of the electrophoresis machine were filled with TEB-buffer. Two cloth wicks were soaked in TEB-buffer and placed so that they touched evenly along each of the exposed sides of the gel, while still connected to the buffer. An ice pack was placed on top of the gel and the electrophoresis (350 V, 50 mA) took 1.5 h.

After electrophoresis, the gel was cut horizontally with a metal thread and only the bottom half was used for staining. The agar overlay used for staining consisted of 20 ml of 2% agar, 10 ml HCl-Tris buffer (pH 8.6), 6 ml propan-2-ol, 1.5 ml 3-(4,5-dimethylthiazolyl-2)-2,5- diphenyltetrazolium bromide (MTT), 1 ml Phenazine methosulphate (PMS), and 10 ml Nicotinamide-adenine dinucleotide (NAD). The stain was mixed and poured onto the cut surface of the gel. The covered box with the gel was then carefully put in a dark heating cabinet at 37° C for 40 min, or until the banding pattern was clear and then photographed.

Statistical analysis

We analyzed weight and development time (in days) in R (R Development Core Team 2015) using general linear models. Weight was highly correlated with all other morphological characters (Supplemental Table S1) so we only analyzed weight data and development time in a statistical framework. Weight data was normally distributed and so was analyzed assuming a Gaussian distribution. Development time is essentially count data and right skewed, so we analyzed the data assuming a Poisson distribution. For both data sets we considered individual flies to be our unit of replication and used the lme4 package in R (Bates et al. 2014) to construct a model with all factors (population, substrate, sex, and karyotype) and all two and three way interactions. We also included pot (replicate) nested within population * substrate as a random factor. We ran the ‘Anova’ function in the R package car (Fox and Weisberg 2011) on the full models and then examined the F-values for each term. Starting with a model with only an intercept we built up a series of nested models adding a single main effect, or interaction at a time by decreasing F-value (i.e. starting with the term with the highest F-value). We then used the ‘Anova’ function to compare the models in a nested fashion (i.e. each model is only compared to the one above it) and used a χ2 test to determine if they were significantly different. To estimate p-values for each of our terms we compared our best model with and without each term using the ‘Anova’ function and generated χ2 statistics with associated p-values.

Visual inspection of our morphological data indicated that males were more variable than females. We formally tested for differences in variability for each morphological measurement in a statistical framework using Krishnamoorthy and Lee’s (2014) modified signed-likelihood ratio test (MS-LRT) implemented in the R package cvequality (Marwick and Krishnamoorthy 2016). We used a Bonferroni correction to account for multiple testing (Bland and Altman 1995).

We further examined the relative proportions of the three karyotypes (αα, αβ, and ββ) in all population × substrate combinations. For these analyses we used pot as our unit of replication. We tested for differences in two ways. First, we tested for karyotype differences using a χ2 test on a 3 karyotypes × 16 population-substrate combination matrix. Second, we used a Dirichlet regression model in the R package DirichletReg (Maier 2014). The Dirichlet regression model allows us to analyze karyotypes (variables with a bounded interval that sum up to a constant—in our case frequencies of karyotypes in a population) exhibiting skewness and heteroscedasticity, without having to transform the data (Maier 2014). We used a nested approach starting with a simple model and then adding factors. Models were tested against the one above them in a nested fashion using the ‘Anova’ function in the R package car (Fox and Weisberg 2011) to determine the best model.

Results

Phenotypic data

We had different sample sizes for different treatment combinations (see figure legends for Figs. 2, 3) due to differential survival of the three karyotype categories. Three of the 96 combinations had zero flies (ex: αα Kämpinge females in Kämpinge) but we do not expect that this unduly influenced our results as we did not investigate any 4 way interactions in any of our statistical models and thus did not examine these three combinations directly. All morphological characteristics were highly and significantly correlated with each other (Supplementary Table S1).

Fig. 2
figure 2

Boxplot of weight across populations (listed at Top), substrate (listed on x-axis), sex (indicated by top (male) and bottom (female) charts), and karyotype (indicated by color, red—αα, blue—αβ, and green-ββ). Outliers are not shown for clarity. N for each population × sex combination (from left to right for each karyotype and substrate as listed in the figure): Kåseberga Male: 10, 19, 5, 2, 11, 6, 5, 9, 10, 2, 9, 6, Kämpinge Male: 3, 20, 15, 1, 9, 4, 3, 13, 19, 1, 21, 6, Mölle Male: 10, 14, 1, 6, 8, 13, 6, 12, 13, 2, 6, 10, Steninge Male: 7, 17, 7, 0, 3, 2, 5, 15, 5, 7, 16, 14, Kåseberga Female: 7, 24, 11, 2, 3, 6, 4, 12, 18, 5, 13, 5, Kämpinge Female: 1, 12, 11, 0, 5, 4, 0, 13, 18, 3, 17, 9, Mölle Female: 4, 9, 4, 4, 10, 4, 6, 6, 5, 4, 3, 8, Steninge Female: 8, 21, 8, 2, 5, 4, 8, 19, 6, 7, 21, 7. (Color figure online)

Fig. 3
figure 3

Boxplot of development time across populations (listed at Top), substrate (listed on x-axis), sex (indicated by top (male) and bottom (female) charts), and karyotype (indicated by color, black—αα, grey—αβ, and white-ββ). Outliers are not shown for clarity. N for each population × sex combination (from left to right for each karyotype and substrate as listed in the figure): Kåseberga Male: 10, 19, 5, 2, 11, 6, 5, 9, 10, 2, 9, 6, Kämpinge Male: 3, 20, 15, 1, 9, 4, 3, 13, 19, 1, 21, 6, Mölle Male: 10, 14, 13, 1, 6, 8, 6, 12, 13, 2, 6, 10, Steninge Male: 7, 17, 7, 0, 3, 2, 5, 15, 5, 7, 16, 14, Kåseberga Female: 7, 24, 11, 2, 3, 6, 4, 12, 18, 5, 13, 5, Kämpinge Female: 1, 12, 11, 0, 5, 4, 0, 13, 18, 3, 17, 9, Mölle Female: 4, 9, 4, 4, 10, 4, 6, 6, 5, 4, 3, 8, Steninge Female: 8, 21, 8, 2, 5, 4, 8, 19, 6, 7, 21, 7

We only examined weight in a statistical framework, as this morphological trait is a good proxy of the other morphological measurements (Pearson correlation coefficients with other traits varied from 0.78 to 0.89, Supplemental Table S1) while still being an easily tractable and understandable measurement. Weight was not strongly correlated with development time (Pearson’s product-moment correlation: 0.10) and therefore development time was examined separately.

We used a model building approach by first examining the effect sizes of a full model and then using these to inform our model building. F-values from the full models indicated that effect sizes were much larger for weight than development time (Table S2). The best model for weight included all main effects, the two-way interactions: sex * karyotype, population * sex, population * karyotype, population * substrate, substrate * karyotype, and the three-way interaction: population * substrate * karyotype. Comparing the best model with the next most simplified model gave a χ2 value of 43.3 (P = 0.00073). The significance of our terms (determined with a term dropping approach) can be seen in Table 2A.

Table 2 Best models for adult weight and development time

We also used a model building approach for development time. The best model for development time included the main effects and the two-way interaction karyotype * sex. Comparing the best model with the next most simplified model gave a χ2 value of 8.6 (P = 0.0135). The significance of our terms (determined with a term dropping approach) can be seen in Table 2B.

The best models for both weight and development time included all main effects and a subset of interactions (Table 2). However, a G × E interaction (as evidenced by a population × substrate interaction) was only present for weight, not for development time.

We used a modified signed-likelihood ratio test (MS-LRT) to experimentally test if there was a difference in variation between males and females. For all measured traits, the Krishnamoorthy and Lee’s (2014) modified signed-likelihood ratio test (MS-LRT) showed that males had significantly more phenotypic variation than females (Table 3).

Table 3 Results from Krishnamoorthy and Lee’s (2014) modified signed-likelihood ratio test (MS-LRT) looking to see if the coefficient of variation is different between males and females

This variation could be partially accounted for by karyotype. Previous work has shown that males are on average bigger than females and that within male size differences (but not female size differences) are primarily driven by the αβ inversion system: αα males are the biggest followed by αβ and ββ (Butlin et al. 1982). This held true in our male data set (Fig. 2) but the magnitude of these differences depended on both population and substrate type (Fig. 2, Supplementary Figs. 1–3). Although the relative sizes and size differences of the karyotypes differed between combinations of location and substrate, the pattern of both leg length and wing length, thorax width, and weight (Fig. 2, Supplementary Figs. 1–3) in relation to karyotype and sex was clear and consistent for males, but much less pronounced for females. Likewise, across treatments (combination of population and substrate), αα-males had the largest mean size, followed by αβ-males and ββ-males in all different population × substrate combinations (Fig. 2). For females, less than half of the combinations of population and substrate followed the same pattern.

While males had more variation than females for development time karyotype did not seem to play a large role in explaining this variation (Fig. 3). While αα-males generally took longer to develop, this was not always the case for every combination of population and substrate. For example, ββ flies from Steninge, raised on Kämpinge substrate took longer to develop than any other flies from Steninge.

Inversion types

The frequencies of αα, αβ, and ββ were significantly different across the Population × Substrate combinations (χ2 = 71.5, df = 30, P < 0.0001). Generally αβ was the most frequently observed karyotype, but ββ could be just as frequent or more frequent depending on the substrate. In ¾ of the populations (all except Steninge) ββ was the most frequent karyotype in at least one substrate and that was usually the substrate from Mölle.

We used Dirichlet regression to see which terms best predicted the karyotype frequencies. With this model we were not able to examine F-values so we compared all possible Dirichlet models with the ‘Anova’ function from the R package car (Fox and Weisberg 2011). The best model was population + substrate + population * substrate. Comparing this with the next most simplified model gave a χ2 value of 73.99 (P = 0.0002).

Discussion

Our results demonstrate that the environment modulates both the frequencies of the karyotypes as well as their phenotypic effects in C. frigida. The presence of a significant G × E term for both of these indicates that there is population specific variation in larval fitness of the different inversion types as well as in the phenotypic effects of the inversion. The same pool of eggs was used to seed every substrate treatment per population so the neutral expectation (without sampling bias) would be similar karyotype frequencies across substrates. Instead, we found strong differences in karyotype frequencies between substrates with some trends being consistent across multiple populations (ex: higher frequency of ββ in Mölle substrate, Fig. 4). While it is possible that these results are due simply to sampling bias, it is unlikely that sampling bias would produce consistent trends. Furthermore, this result confirms previous research showing that the substrate composition has a strong selective effect on the inversion (Butlin and Day 1984; Day et al. 1983; Edward and Gilburn 2013; Mérot et al. 2018). Karyotypic specific survival differed across populations suggesting that that population specific variation has accumulated in the inversion. For example, ββ was the most common karyotype on Mölle wrack for every population except Steninge (Fig. 4). Instead, in Steninge, little variance was seen between substrate combinations compared to other populations.

Fig. 4
figure 4

Box plot of frequencies of different karyotypes (black—αα, grey—αβ, and white—ββ) across populations (listed at the top of each graph) and substrates (listed on the x-axis)

Our results also indicate that inversion-related size effects in C. frigida are affected by the environment, particularly in males as females show little to no effect of karyotype on size. Male fly inversion phenotypes from the same population could vary two-fold for weight when grown on the same substrate (e.g. Steninge population on Mölle substrate), and three-fold when comparing male inversion phenotypes across all four growing substrates (e.g. Steninge, Fig. 2). Male size has a high heritability with virtually all the variance in size being attributable to the chromosomal inversion system (Wilcockson et al. 1995). In all of our treatment combinations, environmental effects never swamped the effect of the inversion. Male αα were always the biggest, followed by αβ and ββ, which is consistent with previous work (Fig. 1a; e.g. Butlin et al. 1982). Population and substrate changed the magnitude of these differences as well as size averages but never the pattern of the inversion size effects on males. Part of this variation may be due to parental effects as the parents were reared in their “home” substrate. Parental larval diet has been shown to influence offspring size in Drosophila melanogaster (ex: Valtonen et al. 2012; Vijendravarma et al. 2010). However, despite strong environment and inversion effects on male size, the inversion, the population, and the environment exhibited negligible size effects in females. Specifically, our data show much lower between karyotype variation in size and considerably less variation across environments for females. Thus, if maternal effects are present they are not influencing males and females in a similar manner. A similar difference in effect size between males and females has also been found in a common garden study on C. frigida grown on artificially mixed substrates containing in excess either the seaweed genus Laminaria or Fucus (Edward and Gilburn 2013). The study by Edward and Gilburn (2013) also found that flies grown on an excess of Laminaria showed greater among inversion size differences compared to Fucus, suggesting that the type of seaweed itself, or the associated microbiome of the seaweed, causes phenotypic differences in size. The wrackbed composition differs markedly along the populations studied, as a consequence of the declining salinity from the North Sea to the Baltic Sea (See Figs. 1, 2 in Wellenreuther et al. 2017), and have likely been major selective environmental factors in modulating C. frigida populations. Development time, another trait known to be affected by inversion type, also showed significantly higher variation across male karyotypes, but exhibited less extreme environmentally induced effects compared to size, and consequently no significant G × E effects. These results together with previous studies (Butlin and Day 1984; Day et al. 1980) underscore that the inversion related phenotype effects in C. frigida are significantly affected by sex, with male phenotypes being more labile compared to female phenotypes.

Interactions between genotypes and the environment are commonly observed in quantitative traits (Gupta and Lewontin 1982), particularly those associated with fitness (e.g. Fanara et al. 2006), and are thought to provide a potent force for maintaining genetic variation (Fernández Iriarte and Hasson 2000; Gillespie and Turelli 1989; Hedrick 1986). The habitat used by C. frigida is heterogeneous and the resulting local population structure is similar to Levene’s migration model of subdivided populations, namely by a single pool of individuals that mate at random and breed on discrete and ephemeral resources. Environmental fluctuations in seaweed substrate composition likely favors the persistence of alternative rearrangements (i.e. the α/β inversion) containing sets of co-adapted alleles via balancing selection (Carroll et al. 2001; Filchak et al. 2000; Weinig 2000), with each set being optimal under different conditions (Dobzhansky 1970). This hypothesis may account for the so called “supergene” idea (Schwander et al. 2014). Balancing selection resulting from seasonal and stochastic temporal and spatial wrackbed changes may thus be a significant contributor to maintaining the fitness related inversion variation in size. Other examples for selection on fitness trade-offs come from introduced apple Malus pumila and native hawthorn Crataegus spp. hosts which has led to divergence in phenology and thermal physiology in the true fruit fly Rhagoletis pomonella (Filchak et al. 2000).

Balancing selection has also been implicated in the maintenance of other inversion polymorphisms, but whether balancing selection can maintain genetic polymorphisms over long time scales is still debated (Bernatchez 2016). Theoretical studies indicate that balancing selection may primarily be a short-term process because the genetic load that it creates then has a negative feedback that hinders the long-term maintenance of adaptive alleles (Fijarczyk and Babik 2015). Furthermore, rapid environmental changes may cause dramatic allele frequency shifts that could cause the loss of formerly stable polymorphisms (Fijarczyk and Babik 2015). Despite this, recent studies have provided some evidence that balancing selection can operate even over longer time scales. Lindtke et al. (2017) investigated the inversion related cryptic colour polymorphism in Timema cristinae walking stick insects using a comprehensive dataset of natural populations and population genetic analyses. They were able to demonstrate that inversion colour variation has likely been maintained over extended time periods by balancing selection, which has previously been thought to be only possible over short time frames. Balancing selection (via spatially variable selection) on chromsomal inversions has been widely demonstrated in Drosophila species (Wellenreuther and Bernatchez 2018). The most well studied of these is the ln(3R)Payne inversion in Drosophila melanogaster, which shows stable parallel clines pertaining to latitude on 3 continents (Kapun et al. 2016). In Drosophila pseudoobscura many different arrangements on the third chromosome form clines across the Southwestern United States that are maintained by a migration selection balance between different environmental niches (Schaeffer 2008).

In conclusion, our results indicate that the inversion in C. frigida likely evolves via a combination of local mutation, G × E effects, and differential fitness of karyotypes in heterogeneous environments. Future work should apply high-resolution genome-wide approaches that target the regions inside the inversion to identify the true genic targets of varying selection. This approach would also elucidate which genes inside the inversion are linked to these large size effects in males, and how the environment has shaped the differentiation between these genes along gradients.