There is a widespread interest in the genetic structure and variability of natural populations, but not many studies have addressed the effects of contaminants in marine environments on population genetics (for review see e.g. Belfiore and Anderson 2001; Medina et al. 2007). The evolutionary response of populations can be surprisingly rapid, about a quarter of the rate of ecological change (Hairston et al. 2005) and environmental shifts caused by humans is often more rapid than naturally occurring changes and thus requires a faster response from populations (e.g. Hendry et al. 2008; Palumbi 2001; Stockwell et al. 2003). When a population is exposed to a new environment with a novel selective pressure caused by anthropogenic contaminants a population could respond in different ways. The population could decrease drastically causing a genetic bottleneck that result in lowered genetic diversity in the population (Bickham et al. 2000). Some alleles could be beneficial and increase in frequency in the new environment (Wirgin and Waldman 2004), or genetic variation could actually increase due to high mutation rate caused by genotoxic contaminants (Dubrova et al. 1996; Ellegren et al. 1997). In a recent meta-analysis of the impact of pollution on marine environments Johnston and Roberts (2009) noted that pollution was never associated with the complete exclusion of life from a location and that commonly 50–70% of species that were able to tolerate the contaminant load.

Rare alleles, which under normal circumstances are neutral or slightly deleterious, may increase individual tolerance to a novel environmental stressor (Bell and Collins 2008). Such alleles can accrue fitness benefits in the new environment and by natural selection later become common in the population. This appears to have happened when oceanic three-spined sticklebacks (Gasterosteus aculeatus, L) got trapped in freshwater habitats. The EDA allele responsible for lateral plate reduction is rare (1%) in oceanic populations of sticklebacks but is common in freshwater habitats. Due to the strong selection pressure favoring this allele in freshwater environment, its frequency increases dramatically in freshwater (to 100%) and is thus critical to the repeated occurrence of locally adapted freshwater stickleback populations (Barrett et al. 2008; Colosimo et al. 2005; Hohenlohe et al. 2010; Mäkinen et al. 2008).

In the Baltic Sea high concentrations of pollutants in the water and sediment (Elmgren 1989; Fonselius 1972) may drive evolution of resistance to contamination, and as in evolutionary change in general, genetic variation available in gene-loci involved in resistance is critical to rate of evolution and magnitude of resistance. When populations Attheyella crassa (Crustacea) were exposed to contaminated coastal Baltic Sea sediment mortality increased, levels of genetic variation decreased, and the experimental populations diverged genetically (Gardeström et al. 2008). These experiments highlight the complicated interaction between selection and random genetic drift caused by environmental change. To date a few studies have conclusively demonstrated the evolution of increased genetic tolerance to contaminants in natural populations (Williams and Oleksiak 2008; Wirgin and Waldman 2004).

It has been shown that G. aculeatus respond to anthropogenic disturbance (eutrophication) with plastic alterations in their behavior such as enhanced reproductive success, increase in courtship activity and increased cost of courtship for males (Candolin 2009). Hence, eutrophication relaxes opportunity for sexual selection on several traits and that could increase the opportunity for natural selection (Candolin 2009). Further, it has been known for over three decades that effluents from paper and pulp mills affect fish reproduction such as decreased gonad size, altered expression of secondary sex characteristics and reduction in fecundity (for review see Hewitt et al. 2008). For example, female mosquito fish, Gambusia sp., downstream of pulp and paper mill outlets in Florida had a masculinized development and decreased embryo production (Orlando et al. 2007). Female G. aculeatus exposed to pulp mill effluents has been shown to masculinization and induction of the male specific protein spiggin (Katsiadaki et al. 2002). Masculinization of eelpout, Zoarces viviparus, has been reported close to a pulp mill effluent in the Baltic Sea with a lowered percentage of females (Larsson et al. 2000). It has been suggested that androgenic steroids derived from microbial degradation of phytosterols originating from the wood, such as androstenedione, androstadienedione and progesterone, are responsible for the masculinisation of fish observed in the receiving waters of pulp and paper industry effluents (Denton et al. 1985). Pulp mill effluents have also been shown to exhibit estrogenic effects in fish. Elevated plasma vitellogenin levels have been observed in rainbow trout, Oncorhynchus mykiss, exposed in vivo to bleached mill effluents (Tremblay and van der Kraak 1999; van den Heuvel and Ellis 2002), and in juvenile whitefish, Coregonus lavaretus, exposed to effluents from a chlorine free bleaching process (Mellanen et al. 1999). Still, Pettersson et al. (2007) could not detect any estrogenic or androgenic disruption in juvenile three-spined stickleback at receiving waters of pulp mill in the Swedish Baltic Sea coast.

Patterns of genetic diversity between populations at multiple genetic loci are often used to detect loci under selection in genome scans (for a review see Luikart et al. 2003). Loci involved in local adaptations should show high FST values compared to the global average FST (e.g. Bonin et al. 2006; Excoffier et al. 2009; Wilding et al. 2001; Shimada et al. 2011). We have used AFLP (Vos et al. 1995) as it yields primarily neutral genetic markers from multiple locations in the genome, making it possible to find loci that are under selection and to detect adaptation to habitat differences and environmental change (e.g. Bensch and Åkesson 2005; Campbell and Bernatchez 2004; Wilding et al. 2001).

Here we have sampled G. aculeatus near four different paper pulp mills in the Baltic Sea (three in Baltic proper and one in the Gulf of Bothnia) and four adjacent reference sites. The aim of the study was to examine whether pollution from point sources could act as a selective agent and drive local adaptation. In this study we sought to test whether G. aculeatus sampled at polluted sites differ in their genotypic composition from fish sampled from nearby reference sites. And if polluted environments at different geographical sites constitute a selective regime more similar among polluted habitats. We also wanted to evaluate if levels of genetic variation within populations differ between contaminated and reference sites. And finally, investigate the possibly to identify genetic loci under directional selection using a genome scan approach.


All samples (Table 1) were collected in 2003 in a nested way with one reference south of every polluted site. Our study thus contrast two habitat types; polluted sites located close to the main exhaust of the pulp mill (1–5 km) and reference sites less influenced by the point source located 7 to 50 km south of the reference site (Fig. 1 and Table 2). The sites were chosen based on information from local authorities’ environmental assessments schemes. At the sites in southern Sweden adult fish, both females and males, were sampled in late April during breeding season and in northern Sweden fry were sampled in August. Fishes were caught with drop nets and trap nets, depending on habitat conditions, in shallow water from 1 meter depth to 2 m dept. A small piece of the tail-fin was collected (2 × 2 mm) and stored in 99% ethanol and the fish were then released.

Table 1 Sampling sites, abbreviation for sampling site (Pop), habitat (reference or polluted), name of sampled pulp mill, number of sampled individuals from each site (N) and calculated genetic variation within sites (He) analogous to Nei’s gene diversity with standard errors, from software AFLP-SURV. Mean Hs (Nei’s gene diversity) with standard errors for all populations, polluted populations and reference populations at the bottom right corner of the table (here N = number of sites)
Fig. 1
figure 1

Sampling sites along Sweden’s eastern coast in the Baltic Sea. Three different geographic areas (a, b and c) were sampled. In area a in the Gulf of Bothnia one pulp mill, Uta-P, and one reference sites, Uta, were sampled. In area b Mönsterås pulp mill we sampled, Mon-P, and one reference site, Mon. Further south, in area c, we sampled two pulp mills Nymölla, Nym-P and Mörrum mill, Mor-P, and two references, Nym and Mor

Table 2 All sample sites, above diagonal geographical distance between sites in kilometers measured as shortest possible way in water, below diagonal mean pairwise FST from software AFLP-SURV, significance of FST tested with 1000 permutations, * P < 0.05, ** P < 0.01

The Baltic Sea is a brackish inland sea with a salinity gradient stretching from south to north and lack visible tide and lack strong current patterns (Elmgren 2001). Salinity at sampling sites in Baltic proper (site Nym/Nym-P, Mor/Mor-P, Mons/Mons-P; see Fig. 1) were 6.7 psu at the time of sampling (SHARK database in Gulf of Bothnia (site Uta/Uta-P) salinity is much lower, around 2 psu. All pulp mills in this study have been active for more than 40 years. The site at the Mönsterås pulp mill has been extensively studied. The effluent is subject to activated sludge treatment, and it enters the sea via a 5-km effluent tube with a 1.5 km diffuser at a water depth of 9–12 m, into fairly well ventilated waters. The final discharge is approximately 35 m3 per ton of pulp produced. The plume from the mill can reach at least 3.8 km south or 2.4 km north (1:500 v/v dilution) depending on different current conditions (Landner et al. 1994). The activated sludge plant and the ultra filtration plant had a notable effect at one of the pulp mills (Nymölla) on the emissions of organic matter into the Baltic Sea. Emissions of chemical oxygen demand (COD) have declined from 45000 metric tonnes/year in 1993 to 13000 in 2001 (HELCOM 2004). Due to changes in the productions process environmental impact of the discharge has been lowered continuously since mid 1980 and the impact is fairly local (0–3 km). Moreover, most of the effects on the local ecosystem seem to be related to eutrophication (Landner et al. 1994).

Genetic analysis

Total genomic DNA was isolated from a 2 × 2 mm tail fin clip as described by Laird et al. (1991). Approximately 25 mg of fin was incubated for 4 h in 56°C with 100 μl lysis buffer and 0.03 mg Pro K in a total volume of 103 μl. DNA from the supernatant was extracted with 99.5% ethanol, centrifuged and the pellet washed with 70% ethanol. DNA was dissolved in 1× TE buffer and kept at 4°C. The concentration of the DNA was determined using Nanodrop© ND-1000 and then diluted to the working concentration of 50 ng/μl.

AFLP markers were generated as described by Vos et al. (1995) with minor modifications as described by Bensch et al. (2002) and prior to AFLP analyses, samples were randomized in relation to sampling site to minimize between batch-variation in the PCR reactions (Bensch and Åkesson 2005). Genomic DNA (250 ng) was cut with EcoR1 (5′-G↓AATTC-3′ MBI Fermentas) and Tru1 (5′-T↓TAA-3′ MBI Fermentas) and after digestion E-adaptor (5′-CTCGTAGACTGCGTACC-3′, 3′-CATCTGACGCATGGTTAA-5′ MWG Biotech) and M-adaptor (5′-GACGATGAGTCCTGAG-3′, 3′-TACTCAGGACTCAT-5′ MWG Biotech) was ligated to fragments. In pre-amplifications we substituted 45% of the ddH2O with bovine serum albumin (1 μg μl−1) and used E-primer (5′-GACTGCGTACCAATTCT-3′) and M-primer (5′-GATGAGTCCTGAGTAAC-3′). PCR products were diluted 10 times in ddH2O and stored in −20°C. In selective amplification 12% ddH2O was substituted with bovine serum albumin (1 μg μl−1). We used three different primer combinations, each combination with tree additional bases at the 3′-ends, M-primer (5′-GATGAGTCCTGAGTAANNN-3′) and E-primer labeled in the 5′-end with HEX. (5′-GACTGCGTACCAATTCNNN-3′) and we used following primer combinations Etct/Mcta, Etct/Mcac and Etag/Mcac. Approximately 10% of the samples were duplicates. DNA fragments was separated on an ABI-3730XL capillary electrophoresis unit at Uppsala Genome Center with separation medium POP7TM Polymer (Applied Biosystems), size standard GeneScanTM 500 ROXTM (Applied Biosystems), injection time 15 s (1.6 kV), run time 1600 s and array length 50 cm. Data was scored in Genemapper 3.0 (Applied Biosystems); analysis range was set to 150–500 bp, bin width to 1 bp and locus selection threshold 200 RFU using default settings with normalization within project. We manually checked that duplicate samples yielded the same genotypes. Each primer combination was scored separately in Genemapper 3.0 and the three genotype matrixes were set together to one dataset.

Statistical analyses

Estimations of genetic variation within sites (He and Hs) were obtained from software AFLP-SURV 1.0 (Vekemans 2002) using the approach of Lynch and Milligan (1994) assuming Hardy–Weinberg equilibrium and using the Bayesian method with a non-uniform prior distribution of allele frequencies (Bensch and Åkesson 2005; Zhivotovsky 1999) and 1000 permutations to test the significance of the FST. Genetic distances, FST, were combined in a matrix with geographic distances to test for Isolation by distance with Isolation by Distance Web Service version 3.16 (Jensen et al. 2005) with a Mantel test. We did a principal coordinate analysis, PCoA, using the capscale procedure in the Vegan package (Oksanen et al. 2006) in R 2.5.1 (R Development Core Team 2007) and a Constrained PCoA with habitat (viz. polluted and reference sites) as a constraint. This is similar to a redundancy analysis but allows non-Euclidian dissimilarity indices (here we used Jaccard distances). Genetic structure was further investigated with the Bayesian approach in STRUCTURE 2.2. (Pritchard et al. 2000), burnin was set to 50000 with 70000 additional cycles. Each run was iterated 3 times, and number of clusters (K) set from 1 to 10, assuming admixture and uncorrelated allele frequencies. Structure Harvester on the Web, version 0.56.4 (Earl 2009) was used to evaluate K (Evanno et al. 2005). To test if data separates between the two different habitats we did a locus-by-locus Analysis of Molecular Variance (AMOVA) in the software Arlequin 3.0 (Excoffier et al. 2005) which assumes that loci are unlinked and correctly adjust sample sizes for each locus. We also tested FST values for each locus to find FST-outlier loci by simulations as implemented in the Arlequin package (version Populations were grouped according to geography (Nym/Nym-P, Mor/Mor-P, Mons/Mons, Uta/Uta-P, Fig. 1) and we simulated 50 groups with 100 demes each for 50000 runs. This analysis simulates evolution in a hierarchical set of populations according to the Wright island model. Coalescent simulations are used to get a null distribution of locus-specific FST values and confidence intervals around the observed values to test if observed FST values can be considered as outliers in relation to the overall observed FST value for each loci (Excoffier et al. 2009). In order to further test the effect of different selection between polluted and reference sites identified FST-outlier loci were removed from the dataset and data was re-analyzed. To test the robustness of the FST-outlier analysis, data was divided into the two habitat types, and analyzed in random forest package in R 2.5.1 (R Development Core Team 2007), an algorithm for classification developed by Breiman (2001), to identify which loci that contributed most to the difference in genotypes between habitats.


The three primer combinations resulted in a total of 248 polymorphic AFLP-loci which all were used for statistical analyses. Primer combination Etct/Mcta gave 80 loci, Etct/Mcac 79 loci and Etag/Mcac 89 loci. All duplicate samples gave a congruent genotype pattern.

The results showed a distinct genetic structure from AFLP-SURV analyses according to which 12% of the genetic variation was among populations (FST = 0.12, P < 0.001), between strictly polluted populations genetic variation were 13% (FST = 0.13, P < 0.001) and between reference sites 14% (FST = 0.14 P < 0.001) (Table 2). Genetic diversity within populations (He, Table 1) did not differ between the two different habitat types (Table 1). All pairwise FST: s except those between Uta-P–Uta and Mor-Nym were significant (Table 2). There were isolation by distance were genetic distances (FST) increased with geographic distance (r = 0.95, P = 0.02). This pattern was consistent also after removing 13 FST-outlier loci (see below) (r = 0.92, P = 0.01). Polluted and reference sites showed isolation by distance to a similar degree, but the isolation by distance patterns were not significant when tested for each habitat group separately.

Data showed a separation on habitat in the PCoA (P < 0.005, tested with a permutation test, Fig. 2 a) where the first axis separates population on a geographic scale and explains 60% of the genetic variation. The second axis separates populations on habitat, explaining 12% of the variation, polluted sites in the Baltic proper forms one group and the references in Baltic proper forms another group. When 13 FST-outlier loci (see below) were removed, the separation on habitat disappeared and data mainly separated on geography (Fig. 2b). The constrained PCoA, using habitat as constraint, showed a significant difference in genotype composition between polluted and reference sites (permutation test P < 0.005, Fig. 2c). Structure analyses showed that K = 6 had the highest probability (Mean LnP K6 = −16659 SD 18.9, Fig. 3a). In one cluster, cluster 3 (Fig. 3b), the two sites from Gulf of Bothnia (Uta and Uta-P) are dominant. Cluster 1, even though not exclusively, consisted mostly of fish collected at polluted sites and cluster 2 was more frequent in reference sites.

Fig. 2
figure 2

a Principal coordinate analysis based on 248 Amplified Length Polymorphism (AFLP) loci. A total of eight sites, four polluted sampled outside pulp mills (Mons-P, Mor-P, Nym-P and Uta-P) and four close-by reference sites, all sites are significantly separated (P < 0.005) tested with a permutation test. The first principal coordinate axis explains 59.6% of the variation and the second principal coordinate axis 12% of the variation. Site names in the plot indicate the centriole of every population. b Principal coordinate analysis were loci identified as FST-outliers and thereby under positive selection (comparing pairs of polluted with reference sites) has been removed. Site names in the plot indicate the centriole (mid-point) of populations. c Constraint principal coordinate analysis with constrain on habitat. Habitat type (polluted and reference) differed significantly with a permutation test (P < 0.005). In the plot * indicates centrioles of grouped reference sites respective grouped polluted sites

Fig. 3
figure 3

Results from structure analyses a shows plotted output data from structure, with L(K) mean (±SD) are plotted for K 1–10, were K is genetic cluster with (no prior information and under the assumption of admixture and correlated allele frequencies). The figure shows that K 6, six clusters, has the highest probability. b Assignment of individuals from the eight sampling sites to each of the six clusters. Each shade of grey, black and white illustrates one cluster. Y-axis show the proportion of every population assigned to the six genetic clusters

The locus-by-locus AMOVA showed that the two habitats were significantly separated (FST = 0.021, P < 0.01, 1023 permutations). The FST-outlier analysis detected 13 AFLP loci that were indicated to be under positive selection with FST values above the 1% quantile (Fig. 4), most of the loci (11) were from one primer combination (Etct/Mcac) the other two loci originated from the Etag/Mcac primer combination. Further, 21 loci were identified as FST-outliers above the 5% quantile. 28 loci were under stabilizing selection, having lower FST than expected (P < 0.01, Fig. 4). There was some congruence between the FST-outlier analysis and the classification analysis performed with random forest, and 5 of the 20 loci ranked as the most important for separation between the two habitats was also identified by the FST-outlier analysis to be above the 0.95 quantile (Fig. 5). Four of the five most informative loci in the classification analysis were significant (P < 0.05) in the FST-outlier analysis. However, an unexpected high number of the most significant FST-outlier loci were not identified by the classification analysis.

Fig. 4
figure 4

FST-outlier analysis for the 248 polymorphic AFLP loci after grouping the matched polluted reference pairs in four groups according to geography (see text). The contour lines indicates the P < 0.05 and P < 0.01 quantiles corresponding to the 95 and 99% confidence intervals. Observations outside these ranges are considered as having significantly higher or lower FST than expected under a neutral model indicating directional or stabilizing natural selection respectively

Fig. 5
figure 5

Output from the classification analysis by Random Forest analysis indicating the highest ranked loci affecting the classification of individuals. Data was divided into the two habitat categories (polluted and references) and the 20 loci of highest importance for separating the categories are ranked. Indicated loci was significant in the FST-outlier analysis above the 0.95 (*) and 0.99 (**) level


We found that the genetic composition of multiple populations of G. aculeatus have responded to the directional selection pressure from the effluents of paper pulp mills in the Baltic Sea. Pairs of polluted and clean sites in proper Baltic all had significant FST values between them (Table 2). Populations sampled at polluted sites were separated from populations from nearby reference sites (Fig. 2a). The genetic difference between polluted and reference sites was further confirmed when habitat was set as a constraint, removing the genetic variation caused by geography (Fig. 2c). Yet, most of the genetic structure in our sample could be explained by geographic distance, and nearly all sites in proper Baltic even though separated by as short distance as 7 km were significantly differentiated (Table 2). This clear geographic structure is in contrast with the findings for microsatellite data presented by Mäkinen et al. (2006) and Cano et al. (2008) who found a very weak population structure in the Baltic Sea even when sites in proper Baltic and Gulf of Bothnia were included.

Genetic variation was not different in the populations from polluted sites as populations from both habitats had similar levels of heterozygosity (Table 1). Neither were there any differences in genetic separation between populations when comparing polluted and reference sites as the global FST in both groups are similar to each other. Loss of genetic variation has been shown in several studies (e.g. Murdoch and Hebert 1994; Street and Montagna 1996). In this study, we sampled wild populations in an environment without physical barriers to migration and it is most likely that there is gene-flow between polluted and clean sites which could compensate for the loss of genetic variation in neutral loci in populations at polluted sites. In wild populations of Fundulus heteroclitus living in heavily PCB contaminated harbors as compared to both moderately contaminated sites and clean sites the same pattern were observed and genetic diversity did not differ between sites even thought the fish in contaminated habitats was shown to differ to great extent in PCB-tolerance (McMillan et al. 2006).

We identified FST-outlier loci that were statistically different from a neutral distribution and therefore were indicated to be under different selection pressures at polluted and reference sites (Fig. 5). When these loci were removed from our dataset the differentiation in the PCoA according to habitat disappeared (Fig. 2a and b). This suggests that the selective pressure is in fact caused by habitat difference at these, or closely linked loci. Williams and Oleksiak (2008) identified several outlier loci when comparing populations of F. heteroclitus from three polluted sites with references and saw that when excluding positively selected loci the pattern of isolation by distance in samples disappeared. We did not observe the same pattern in our study––isolation by distance was present both in the whole dataset, and when removing the FST-outlier loci. The incidence of loci indicated to be under directional selection in our study was 8.4% (21 out of 248) at the P < 0.05 level (Fig. 5). This number of loci under directional selection is similar to the number earlier reported for nine-spined sticklebacks (Pungitius pungitius) in the Baltic Sea region in relation to salinity difference (Shikano et al. 2010). These numbers are relatively high compared to earlier studies of directional selection by genome scan methods (Shikano et al. 2010 and references therein). To identify genes subject to directional selection and responsible for local adaptation in the three-spined stickleback Shimada et al. (2011) genotyped microsatellite markers located within, or closely linked to (<6 kb) target genes. They investigated directional selection for 157 genes with known physiological functions and found a high incidence (17%) of significant footprints of directional selection.

Not all loci identified by the FST-outlier analysis are likely to confer adaptation to a pulp mill influenced environment, as only a minority of the outlier loci was also highlighted by the classification analysis. The precise factors responsible for the putative selection will require further research, as will the role of the potential marker loci identified. It would be possible to look further into the AFLP-loci under selection. It is feasible to convert the AFLP fragment to a codominant genetic marker (Bensch et al. 2002) and further to use it to find candidate genes that could be involved in processes important for the organism’s adaptation to the habitat. Moreover, although most AFLP generated markers are expected to be unlinked (Vos et al. 1995) we cannot exclude the possibility of linkage between some of the identified loci. Linkage analysis of AFLP markers cannot be done reliably in the absence of pedigree data.

Our study showed that marine anthropogenic contamination can act as a selective agent and that is possible to identify genetic loci involved in this differentiation. Moreover, this differentiation in genotypic composition has taken place in an open environment, a sea, and over a time period no longer than 45 years. However, we cannot rule out the possibility that non-toxic effects linked to pulp mill effluents such as eutrophication, changes in vegetation, or changed predation pressure also are of importance and may be responsible for the observed changes in genotype frequencies between the polluted and reference sites.