Background

In the marine realm, pelagic species that have large geographic range and high dispersal capabilities represent a serious challenge to the idea of allopatric divergence (i.e., a large continuous population broken up into smaller units by extrinsic barriers) and to speciation processes in a seemingly continuous environment [1]. The high mobility of these species and the dearth of barriers to gene flow in oceans might be expected to limit the division of species' ranges and, as a result, even distant regions might be connected genetically [1, 2]. Although examples of genetic homogeneity over large distances are common in marine systems, there are also many examples of surprising population structure in marine species with high dispersal potential [1, 37].

Cetaceans are good examples of this kind of species. Despite their broad range and their high dispersal capabilities, many cetaceans often show substantial genetic structure at regional or even fine scale, although the extent varies among species [8]. It is generally argued that these patterns, not always correlated with geographic features, are related to a combination of complex behaviours, such as philopatry, specialisations for local resources, or social organisation into kinship groups [8, 9]. On the other hand, while the dispersal and segregation of populations of terrestrial mammals are frequently influenced by geographic features or climatic characteristics, few such obvious barriers are expected to restrict cetacean dispersal and gene flow in the world's oceans [10, 11]. Variation in oceanographic properties of the water column, such as depth, temperature, currents and winds, are known as important factors in the life of these animals, most obviously in conditioning the availability of their food (for example, see [12]), but their effect on cetacean dispersal and on population structure remains enigmatic.

Small coastal cetaceans such as those of the porpoise family are a model of choice to investigate this issue because they have to face a suite of intrinsic problems not encountered by larger dolphins and whales. Their small size, their demanding reproductive schedule, and their limited ability to store energy force a strong dependency on their food [13, 14]. Therefore, we expect that variation in oceanographic features that determine food availability and abundance (i.e., bathymetry, temperature and primary productivity) should markedly affect local density and dispersal of porpoises. If true, their population genetic structure should correlate, at least partly, with oceanographic characteristics. To test this hypothesis, we examined the genetic structure of one the most widely distributed porpoises, the harbour porpoise Phocoena phocoena (L. 1758), with regards to seascape characteristics. Harbour porpoises occur fairly continuously throughout cold coastal waters of the North Pacific and the North Atlantic, with a relict population in the Black Sea separated from the Atlantic range by the Mediterranean Sea where porpoises are nowadays absent [1517]. We analysed genetic polymorphism at 10 microsatellite loci for an extensive sampling (n = 752) covering the main distribution of harbour porpoises in the central and eastern North Atlantic (Figure 1) using a combination of recent individual-based landscape genetic approaches [1821].

Figure 1
figure 1

Bathymetric map of the eastern North Atlantic showing the approximate geographic sampled locations and sample sizes per location. Geographic locations are based on GPS coordinates or reported discovery location. The bar scales delimit the latitudinal range of the three spatial scales considered for the analyses of the North Atlantic (NAt) cluster: the global scale (1); the middle scale, south (2A), and north (2B) parts; and the small scale, the south (3A), middle (3B), and north (3C) parts. The map is projected using a gnomonic projection centred on the sampling centroid (scale units in km).

Here, we provide strong and clear evidence that seascape imposes major constraints on the demographic and genetic structure of a cetacean, and thus on its dispersal. This finding is of general interest in the context of climate change and habitat fragmentation for marine species, as ecosystems in the eastern North Atlantic are shifting toward a warmer dynamic equilibrium with significant changes already detected in plankton and fish assemblages.

Results

We applied two complementary Bayesian clustering algorithms, namely Structure v.2.1 [18, 19] and Geneland v.1.0.7 [20], to infer population structure (i.e., a number of clusters, K) and to assign individuals (probabilistically) to populations (or clusters) based on individual multilocus genotypes and, for the second algorithm, also on individual spatial origins. Both of these approaches assume that populations are panmictic units with distinct allele frequencies. To test whether individual dispersal is restricted in space, we analysed the pattern of isolation by distance (IBD) using the individual-based approach developed by Rousset [21]. This involves regression of an index of genetic differentiation on marine geographic distance among pairs of individuals (see Methods). Finally, recent migration among populations (within the last few generations) was assessed using a Bayesian model implemented in BayesAss v.1.3 [22]. This algorithm requires few assumptions for assigning individual genotypes to population of origin and, in particular, relaxes the key assumption of Hardy-Weinberg (HW) equilibrium within populations.

Clustering analyses

Structure analysis

Structure provided consistent results over 10 replicated runs tested for each K and over the different models tested (see Methods). Generally, in highly structured data sets, as K is increased the most divergent groups separate into distinct clusters first [18, 23]. The probability of the data (LnPr(X|K)) greatly increased from K = 1 to K = 2, and then reached a maximum value at K = 3, after which the values decreased gradually (Figure 2a). The increase of likelihood (Δ(LnPr(X|K)); Figure 2b), i.e. the gain of explanatory power of the model when adding a new cluster to the analysis, is high at K changing from 1 to 2. At K = 2, the two clusters are anchored by the Black Sea (BS) and the North Atlantic porpoises (Figure 3). The addition of a third cluster (K = 3) further increases the probability of the data, the gain of power becoming null or negative for higher values of K (Figure 2b). At K = 3, the North Atlantic cluster splits in two distinct parts that persist and become more clearly distinct for higher values of K (Figure 3). The first is a genetically homogeneous cluster that encompasses porpoises from Spain and Portugal with high membership coefficients (Iberian cluster, IB). The second group is composed of the remaining individuals sampled further north (North Atlantic cluster, NAt). Most of these display membership coefficients that tend to distribute evenly across clusters others than the Black Sea and Iberian clusters as K is increased. The same pattern was observed whatever the model considered in the analysis. This pattern might result from (a) lack of sufficient signal in the data set to confidently assign these individuals, and/or (b) low underlying genetic structure of porpoises in that area, or (c) departure from the basic assumptions of the model. Instead of discrete genetic units at HW and linkage equilibrium, the population structure in northern waters might be much more continuous than discrete, with continuous gradations in allele frequency over the range (see below).

Figure 2
figure 2

Estimated number of populations from Structure (a and b) and Geneland (c and d) analyses. Structure analyses: (a) mean (± SD) probabilities of the data [LnPr(X|K)] over 10 Structure replicated runs plotted as a function of the putative number of clusters (K). (b) Mean variations of probabilities of the data (Δ(LnPr(X|K)) between successive K considered in Structure analyses. For K clusters, this variation is calculated as Δ(LnPr(X|K))=LnPr(X|K) k+1-LnPr(X|K) k . Geneland analyses: posterior density distribution of the number of clusters estimated from Geneland analysis in 7 out of 10 replicates (c) and in the 3 remaining trials (d).

Figure 3
figure 3

Estimated population structure from Structure analyses for K = 2 to K = 5. Each individual is represented by a thin horizontal line divided into K coloured segments that represent the individual's estimated membership fractions in K clusters. Black lines separate individuals from different geographic areas labelled on the right. Each plot, produced with Distruct [79], is based on the highest-probability run (of ten) at that value of K. Individuals are arranged based on their origins and sorted with increasing latitude.

Geneland analysis

While Structure uses only the individual multilocus genotype data to infer the population structure, Geneland also exploits the spatial positions of the individual samples as a supplemental parameter in the analysis [20]. Interesting features of the Geneland model that further distinguish it from that of the Structure model are its ability (a) to deal with an unknown number of populations simultaneously with other parameters, (b) to locate population boundaries across space, and (c) to account for uncertainty in the positioning of sampled individuals (see Methods and [20] for further details). This last feature is particularly useful in the present context as the locations of sampled harbour porpoises, composed of by-caught and stranded animals, might be poorly representative of the normal range of individuals.

The Geneland model provided results consistent with those of the Structure one. Posterior distributions of the estimated number of populations (K) across 10 replicates displayed a clear mode at K = 3 in 7 out of the 10 replicates (Figure 2c) and at K = 4 in the remaining trials (Figure 2d). Similar to the Structure results, Geneland identified three spatially coherent clusters (Figures 4 and 5): the first gathers all porpoises from the Black Sea and Marmara Sea (the BS cluster) isolated from those in the Atlantic by the Mediterranean (Figure 5a); the second gathers the porpoises from the Iberian peninsula (the IB cluster) isolated from samples further north by a barrier to gene flow located in the southern Bay of Biscay (Figure 5b); and the third is unequivocally composed of the samples further north in the Atlantic (the NAt cluster), widely distributed from the French coast of the Bay of Biscay to the Arctic waters of Iceland and Norway (Figure 5c). This last result contrasts slightly with that of the Structure analysis (compare Figures 3 and 4). While the Structure model did not confidently assign these individuals, Geneland assigned almost all them to the NAt cluster with high membership coefficients that remain consistent even for higher values of K (Figure 4; K = 4). This suggests that taking into account the spatial context of individuals might improve the efficiency of the analysis. No individuals were assigned to the fourth cluster detected in 3 out of the 10 Geneland replicates (Figure 4, K = 4, green colour). This is not surprising as this cluster is centred on landmass (not shown). Such occurrences of "ghost" populations, with no individuals assigned, is reported by Geneland's authors as a poorly understood problem [20]. It could be related to the process of tiling a heterogeneous sampling distribution, with "landmass" tiles being reported as a "ghost" population. As there are no individuals assigned to this cluster, as it only occurs in a minor proportion of the trials and as it does not affect biological interpretation in the present context, this "ghost" population can be ignored (as suggested by the Geneland authors [20]).

Figure 4
figure 4

Estimated population structure from Geneland analyses for the two modal solutions K = 3 and K = 4. Each individual is represented by a thin horizontal line divided into K coloured segments that represent the individual's estimated membership fractions in K clusters. Black lines separate individuals from different geographic areas labelled on the right. Each plot, produced with Distruct [79], is based on the highest-probability run at that value of K. Individuals are arranged based on their origins and sorted with increasing latitude.

Figure 5
figure 5

Maps of Geneland individual assignments to clusters for K = 4 (scale units in km). The three plots represent the assignment of pixels to each cluster: (a) Black Sea cluster; (b) Iberian cluster; and (c) North Atlantic cluster. The assignments of pixels to the fourth cluster are not shown, as no individuals are assigned to it ("ghost cluster", see text for further details). The highest membership values are in light yellow and the level curves illustrate the spatial changes in assignment values. The plot is based on the highest-probability run at that value of K.

Genetic diversity and differentiation among inferred populations

The three identified clusters differed greatly with respect to their genetic diversity assessed using heterozygosity and allelic richness, corrected for difference in sample size (Table 1). Harbour porpoises from Iberian waters and the Black Sea displayed comparable genetic diversity that was much lower than that observed in the NAt cluster. For example, the allelic richness over all loci was twice as low in the Black Sea and in Iberia as it was in the NAt cluster (Wilcoxon paired-sample test: BS-IB: p = 0.878; IB-NAt: p < 0.005; BS-NAt: p < 0.005).

Table 1 Genetic variation at the 10 microsatellite loci for populations inferred from the cluster analyses

The amount of genetic differentiation among clusters, estimated using F ST [24], illustrated the high divergence of Black Sea harbour porpoises from those in the North Atlantic (F ST: BS-IB = 0.314, 95% Confidence Interval (CI): 0.240–0.381; BS-NAt = 0.147, 95% CI: 0.116–0.179). The F ST values between Iberian porpoises and those sampled further north in the Atlantic were lower, but remained substantial (F ST: IB-NAt = 0.090, 95% CI: 0.054–0.131). In contrast, F ST values between parts of the NAt cluster (Figure 1 3A–C: ) were much lower (F ST ≤ 0.001; see Additional file 1). Figure 6 provides a global view of the system. It shows that for pairs of sampled localities from different clusters, genetic differentiation is much larger than that between intracluster pairs that have the same geographic distance. In other words, genetic differentiation between clusters is not only induced by geographic distance between them but also by barriers to gene flow.

Figure 6
figure 6

Genetic and geographic distance for pairs of sampled geographic areas. Yellow triangles indicate comparison between pairs of sampled localities within the same cluster; blue squares indicate pairs with one sampled locality in the NAt cluster and the IB cluster; red diamonds indicate pairs with one sampled locality in the NAt cluster and the BS cluster; and black circle indicate the comparison between the IB and the BS cluster.

Tests of departure from HW equilibrium (Table 1) show no significant deviation for porpoises from Iberia and the Black Sea, but a significant deficiency in heterozygosity at 9 of the 10 loci analysed in the NAt cluster. This slight heterozygote deficiency recorded at almost all loci in porpoises of northern Atlantic waters and the failure of the Structure model to assign these individuals in comparison to the Geneland model suggest that a subtle spatial structure (i.e., Wahlund effect) with a continuous gradation in allele frequencies across regions and/or isolation by distance could occur [23, 25].

Isolation by distance analyses

When IBD occurs in 'continuous' populations distributed in a two-dimensional habitat, genetic differentiation among individuals is expected to increase linearly with the logarithm of geographic distance [21, 26]. This linear relationship was demonstrated to hold best at local geographical scale because heterogeneity of demographic parameters (i.e., dispersal and/or density) and the effect of mutation rate are reduced and hence their confounding influence on genetic differentiation is also reduced [27, 28]. However, the scale of population ranges in the marine realm is often unknown and can be quite large (of the order of hundreds or thousands of kilometres squared), especially for cetacean species [29]. As we cannot know the appropriate scale a priori for the NAt cluster, we conducted the IBD analyses considering the range at three different spatial scales (Figure 1 and Table 2). We first analysed IBD in the global range of the NAt cluster that latitudinally extends over 3237 km from the French coast of the Bay of Biscay northwards to the arctic waters of Norway and Iceland (global scale). Then, we subdivided the global range into two parts of equal latitudinal range (Medium scale: NAt-2A and B), then into three parts (Small scale: NAt-3A-C) and repeated the analysis on each part.

Table 2 Isolation by distance within the North Atlantic cluster (NAt). Three different scales defined from the latitudinal subdivision of the global range in two (Medium scale) and in three (Small scale) parts were analysed. See Figure 1 for the delimitations of the NAt subdivisions.

We found a significant positive relationship between the index of genetic differentiation (a r ) and the marine geographic distance among porpoises in the NAt cluster at all scales considered (Table 2) except one: the region NAt-3B. This latter corresponds to the area where the sample size is the lowest (n = 141), where the sampling is the most spatially heterogeneous (Figure 1), and also where the marine distances among porpoises are the shortest (Table 2). Therefore, the absence of significant evidence in this region likely results from the low power of the analysis to detect IBD (see, for example, [30]).

Rousset [21, 26] demonstrated that the regression slope is proportional to 1/4π 2, where D is the effective density of individuals and σ2 the second moment of axial dispersal distance, best described as the mean squared parent-offspring axial dispersal distance. σ2 can be understood as a measure of the speed at which two gene lineages issuing from an ancestor move away from each other, as it is the rate at which the mean squared axial distance between these two lineage increases per time unit [30]. The comparison among subset areas at the medium and at small scale showed significant north-south variation in the parameters of the regression for the 10 microsatellite loci (Table 2). The slope (or 1/4π 2) in the south part of the NAt cluster was significantly higher than that in northern parts at medium scale (Wilcoxon paired-sample test, 2A-2B: p = 0.037) and at small scale (Wilcoxon paired-sample test, 3A-3B: p = 0.005; 3A-3C: p = 0.046; 3B-3C: p = 0.399), suggesting that either density (D), dispersal (σ2), or both are reduced in the south part compared to the north.

Recent migration rates among populations

Recent migration rates (i.e., within the last few generations) were estimated between porpoises from the Black Sea, Iberia and the southern part of the NAt cluster (NAt-3A) adjacent to the detected barrier to gene flow (Table 3) using the BayesAss v.1.3 algorithm [22]. When simulating the effect of having no information in the data from which to estimate migration rates, we obtained a 95% CI of 0.675–0.992 for the proportions of individuals derived from the source populations each generation (or non-migrant rates) and a CI of 0.001–0.261 for migration rates. Confidence intervals recovered from the data set were considerably smaller than those obtained from the null hypothesis (Table 3), suggesting that the data set contained an appreciable amount of information to support the results.

Table 3 Mean ± SD (95% CI) posterior distributions for migration rates among harbour porpoise populations. Values along the diagonal (bold) are the proportion of individuals derived from the source population (or non-migrant) each generation.

Virtually all porpoises from the Black Sea were identified as non-migrant (Table 3). Although this result is not surprising, as the Black Sea population is now geographically isolated from the Atlantic populations by the Mediterranean Sea, this result can be useful as reference to assess the status of the Iberian population. Almost all porpoises from Iberian waters were also identified as non-migrant (98% of the individuals and the 95% CI upper limit including 1), while the NAt cluster showed a slightly lower non-migrant proportion (96%; Table 3). The migration rates between Iberia and the NAt cluster were low (m ≤ 0.03) with the lower 95% CI bounds not different from 0, except in one case: the migration rate from Iberia to the NAt cluster appeared slightly higher than the reverse, but the large overlap of 95% CIs did not allow us to conclude there was asymmetry in migration rates.

Discussion

The individual-based approaches we used here revealed that most of the harbour porpoise range in the central and eastern North Atlantic behaves as a 'continuous' population that widely extends over thousands of kilometres from the French coasts of the Bay of Biscay northwards to the arctic waters of Norway and Iceland, with significant isolation by distance. This striking result is concordant with the low but sometimes significant level of genetic differentiation previously reported at microsatellite loci between arbitrarily defined groups in the North Sea and adjacent waters [31, 32]. However, strong barriers to gene flow in the south-eastern North Atlantic range isolate, on a relatively small scale, porpoises from Iberian waters and on a larger scale porpoises from the Black Sea.

The total isolation of harbour porpoises from the Black Sea has long been suggested on the basis of the lack of field observation of porpoises in the Mediterranean Sea [17], of private mtDNA alleles reported in that population [33], and of morphological differences [34]. Our results lend further support to this hypothesis. The pronounced genetic footprint of this isolation left at nuclear and mtDNA loci suggest this is an ancient isolation that might date back to the last Ice Age ([35] and Fontaine, unpublished results). The genetic differentiation detected at microsatellite loci between the Iberian porpoises and those further north was not apparent at the mtDNA control region previously analysed [35]. The lack of mitochondrial lineage sorting and of private microsatellite alleles suggests that the differentiation we observed with microsatellite analyses is caused by a more recent isolation process than that of the Black Sea.

The corollary of these results is the inference of strong barriers to gene flow in the southern Bay of Biscay and in the Mediterranean Sea that isolate almost completely the Iberian and Black Sea populations. These barriers coincide with strong oceanographic changes of similar nature (compare Figure 5 with Figures 1 and 7). To take them in turn, the conditions in the southern Bay of Biscay differ sharply from those at its margins [36, 37]. The continental shelf, widely extended in the northern part, narrows considerably to the south and is cleaved asunder by the Cap Breton canyon, which drops to the abyssal plain in the south-east, only 10 km from the shore. Warm and oligotrophic surface water spreads from the Cap Breton canyon to cover half of the southern Bay in summer [36, 37]. In contrast, off the Iberian Atlantic coast upwelling becomes evident from late spring to early autumn [38], bringing to the surface cold nutrient-enriched waters that support a rich food-web [39]. On the north side of the barrier, shallow, cold, and nutrient rich waters prevail most of the year from the French waters of the Bay of Biscay northward to the northern North Sea. From a biogeographical point of view, the southern Bay of Biscay is not only a barrier for porpoises but it is also a transition zone between the boreal and subtropical provinces, with many species reaching their southern or northern limit of distribution in that area [40].

Figure 7
figure 7

Climatological (1997–2006) annual sea surface chlorophyll concentrations. Data obtained with Sea-viewing Wide Field-of-view Sensor (SeaWIFS, modified from [80]).

Still further north, depth increases towards Nordic Seas (Figure 1), but waters remain cold and highly productive [41]. However, the bathymetric change does not seem to restrict gene flow in Nordic Seas, consistent with sightings of some porpoises reported far offshore in deep water [42]. While this suggest there are few, if any, potential barriers to dispersal of porpoises from the northern Bay of Biscay up to Arctic waters, the heterozygosity deficit related to the detected IBD shows nevertheless that porpoises do not mate randomly over that extended area and that gene flow is spatially restricted. We observed a north-south variation in the IBD pattern with higher IBD slope at the southern end of this range compared to northern parts (Table 2). One could argue that this north-south variation in IBD pattern might reflect drift disequilibrium [43] in northern areas associated with the postglacial porpoise recolonisation of Nordic waters in contrast to the southern habitats, which likely remained more stable in time. However, simulation-based sensitivity analysis of current 2 estimation to demographic instability in time and space conducted by Leblois et al [44] showed that spatial expansion with constant density does not significantly affect present-time 2 estimation, especially when the spatial expansion occurred 20 or more generations ago, as it is the case for postglacial recolonisation. Consequently, the higher IBD slope detected in the southern area (NAt-3A) compared to that in waters further north (NAt-3B and NAt-3C) most likely represents a lower current-time 2. Although we cannot exclude variation in σ2, a lower porpoise density in southern waters is supported by field estimates based on aerial and ship surveys conducted in the North Sea and adjacent waters [45, 46]. These variations in density (and maybe in dispersal patterns) likely reflect variation in habitat. The southern part of the 'continuous' population (i.e., the northern part of the Bay of Biscay, the English Channel and the southernmost part of the North Sea) borders the barrier to gene flow detected in the southern Bay of Biscay and should thus display sub-optimal conditions for porpoises while the middle (i.e., the central and northern North Sea) and northern areas (i.e., the Nordic Seas) would be more optimal for a cold water species such as the harbour porpoise.

The Mediterranean Sea displays similar characteristics to those encountered in the southern Bay of Biscay but at a larger scale. The Mediterranean is composed mostly of deep basins and narrow continental shelves with warm oligotrophic surface waters prevailing most of the year [47]. These characteristics are likely quite unfavourable for cold water species and might explain why the harbour porpoise is absent from this area. The oceanographic conditions in the Black Sea are, by contrast, more suitable for harbour porpoises with low salinity, colder and more nutrient rich surface waters than in the Mediterranean Sea [48]. There are however reports of porpoise strandings in the northern Aegean Sea [17]. This can be understood with regard to oceanographic features in that area. The subdivision of the Aegean into two basins has long been recognised. The northern basin is under the influence of cold, low salinity waters that pour out of the Black Sea. This water is entrained into a cyclonic circulation affecting the northern and western parts of the Aegean, causing an ecological isolation of the northern basin from the southern basin [49]. In the southern basin the continental shelf is very limited and the waters become quickly characteristic of Mediterranean waters [50], unfavourable for harbour porpoises.

To summarise, surface water temperature and primary production seem to be the factors that best characterise the nature of barriers to gene flow encountered across the harbour porpoise range, their population structure, and their geographic distribution. It is worth noting however that in oceanography, these two parameters are often linked [51]. Indeed, the sea surface temperature acts as a useful proxy for other physical processes, such as vertical stratification and nutrient contents, regulating the size structure, taxonomic composition, and abundance of the phytoplankton community, and thus the food availability for top predators [52, 53]. These results reinforce previous ecological studies on harbour porpoises that reported significant relationships between abundances and movements with sea surface temperature and food availability [54, 55]. Although bathymetry can be important in harbour porpoise ecology [56, 57], we showed that this factor alone seems not to restrict gene flow in northern waters of the sampling range.

While the proximal causes of porpoise dependence on these habitat characteristics are beyond the scope of this paper, the ultimate underlying mechanism is likely related to the high energetic constraints this small cetacean has to face in order to survive. As one of the smallest endothermic marine predators, and furthermore with limited energy storage capacity, it is currently assumed that harbour porpoises must feed frequently without prolonged periods of fasting [16, 58]. Their distribution, their movements, and in sum their overall biology should therefore be closely related to those of their prey and thus to nutrient rich waters.

Conclusion

In the marine realm, community structure is shaped heavily by physical processes (see, for example, [47, 59]). In this study we provide for the first time strong evidence that physical processes determining food availability have major impacts on the demographic and genetic structure of a cetacean. The small body size of harbour porpoises undoubtedly has profound consequences at all levels of their biology and makes this species particularly sensitive to habitat variation. We can however reasonably expect that this will be also applicable to other cetaceans of similar body size, habitat and thermoregulation constraints. However, these constraints could be reduced for larger cetaceans, leading to more complex patterns of population structure not necessarily correlated to seascape features (see, for example, [11]). Ecosystems in the eastern North Atlantic are shifting toward a warmer dynamic equilibrium with significant changes detected in plankton and fish assemblages [51, 6062], but the consequences for marine mammals remain to date unclear [63]. Although further analyses would be require to address the demographic trends of these populations, the genetic pattern highlighted here (i.e., the ancient isolation of harbour porpoises in the Black Sea), the more recent isolation of those in Iberian waters, and the higher IBD in the southern end of the northern Atlantic continuum, suggests that habitat-related fragmentation of harbour porpoise range is under way and that it is likely to continue with the predicted changes in climate.

Methods

Sample collection

Tissue samples were taken from by-caught and stranded harbour porpoises. A total of 752 animals distributed along the eastern North Atlantic range of the harbour porpoise and in the Black Sea were analysed (Figure 1). Out of these, 515 samples were analysed in this study and 237 samples from Iceland and Ireland were analysed by Duke [64].

Most of the individuals were geo-referenced using GPS coordinates recorded at the time or deduced from the reported location where the animal was found. These coordinates are naturally rough approximations to the normal locations of animals, especially for stranded animals, but this error can be considered negligible at the scale of the study range. This source of error can also be taken into account in some of the spatial analyses (see below).

DNA extraction and microsatellite analysis

Total genomic DNA was extracted from tissues using the DNeasy™ Tissue Kit (Qiagen) following the manufacturer's recommendations. Samples were genotyped at 10 microsatellite loci using the multiplex sets defined in [65]. Polymerase chain reaction conditions were as reported in [65]. Amplified DNA was analysed for length variations on an automated 96 capillary MegaBace-1000 DNA Analyser (Amersham Biosciences) using Genetic Profiler v.1.5 (Amersham Biosciences).

Habitat characteristics

Data on habitat characteristics across the study range with respect to salinity and sea surface temperature were taken from the National Oceanographic Data Centre (NODC) [66]. Bathymetric data were extracted from the ETOPO2 dataset available on the US National Geophysical Data Centre (NGDC) [67] and the data on chlorophyll concentration were taken from the NASA Sea-viewing Wide Field-of-view Sensor database (SeaWIFS) [68].

Clustering analyses

We applied two Bayesian model-based clustering algorithms to infer population structure and to assign individuals (probabilistically) to clusters without a priori knowledge of population units and limits.

Structure procedure

The first approach, implemented in Structure v.2.1, uses individual multilocus genotype data to cluster individuals into K groups while minimising Hardy-Weinberg disequilibrium and gametic phase disequilibrium between loci within groups [18, 19]. The estimation procedure consists of running trial values of the number of populations K and then comparing the estimated log probability of data under each K, Ln [Pr(X|K)]. We conducted a series of independent runs with different proposals for K, testing all values from 1 to 10. Each runs used 106 iterations after a burn-in of length 4 × 104, testing different models: (a) with or without admixture, and (b) correlated or uncorrelated allele frequencies. To check for convergence of the Markov chain Monte Carlo (MCMC), we performed 10 replicates for each value of K and then checked the consistency of results. The estimated number of clusters (K) was taken to be the value of K with the highest Pr(X|K) [18].

Geneland procedure

The second algorithm, implemented in Geneland v.1.0.7, differs from that of Pritchard et al [18] mainly by taking into account explicitly the spatial dependence of individuals expected for species whose range is much larger than the average intergeneration movement of individuals. This model aims at inferring and locating genetic discontinuities between populations in space from individual geo-referenced multilocus genotypes, while taking into account uncertainty in the location of sampled individuals [20, 69]. All the parameters (including K) are co-estimated simultaneously by the MCMC algorithm. However, for technical reasons discussed in [20], it is better to proceed in two steps: a first run to infer K, and a second run with K fixed at the modal value to estimate the other parameters (mainly the assignment of individuals to the inferred populations). The first step was replicated 10 times to check for convergence, allowing K to vary from 1 to 10 clusters and using the following run parameters: 106 MCMC iterations, maximum rate of Poisson process fixed at 700, maximum number of nuclei in the Poisson-Voronoi tessellation fixed to 500, and an uncertainty associated with the spatial coordinates of 50 km. We used the Dirichlet model of allelic frequencies as it has been demonstrated to perform better than the alternative model [20]. We inferred the number of clusters (K) from the modal value of K for these 10 runs, and then ran the MCMC again 100 times with K fixed for this value, 5 × 105 MCMC iterations, and the other parameters unchanged. We calculated the mean logarithm of posterior probability of the data (PPD) for each of the 100 runs and selected the 10 with the highest PPD. These 10 runs were then post-processed (with a burn-in of 5 × 104 iterations) in order to obtain posterior probabilities of population membership for each individual and each pixel of the spatial domain (174 pixels along the X axis and 143 along the Y axis corresponding to a pixel size of 25 km side). We finally checked visually for the consistency of results across these 10 runs.

Descriptive statistics among clusters

The allelic richness, corrected for difference in sample size, the observed (H o ) and expected (H e ) heterozygosity (or genetic diversity), and F IS values were calculated within each cluster using Fstat v.2.9.3 [70]. To test whether genetic diversity was significantly different between clusters, we applied a Wilcoxon paired-sample test [71] on the 10 single locus values of the statistics of interest.

Level of genetic differentiation at microsatellite loci among clusters was estimated as F ST after Weir and Cockerham [24] using Fstat v.2.9.3 [70]. The 95% confidence interval was calculated using 15000 bootstrap resamplings [70]. We conducted exact tests to assess deviations from Hardy-Weinberg equilibrium and test for population differentiation using Genepop v.3.4 [72].

Isolation by distance analysis

In continuous populations, an isolation by distance pattern occurs when genetic differentiation among individuals increases with their geographic distance [73]. Here we consider the statistic a r , a multilocus estimator of an F ST/(1-F ST) analogue between pairs of individuals [21]. When a continuous population is represented by a two dimensional lattice (i.e., fixed individual positions and no spatial density heterogeneity), a r is approximately linearly related to the logarithm of the geographic distance between individuals (r), a r ≈ (Ln(r)/4π 2) + C, where D is the effective density of individuals, σ2 is the second moment of the dispersal distance distribution, and C is the value of the linear approximation at r = 1 length unit. Values of a r were regressed against the log of the marine geographical distance (see below) between paired individuals, as described in Rousset et al [21]. Significance of the regression slope was tested by 105 random permutations of individual locations (similar to a Mantel test) using the computer program SPAGeDi v.1.2 [74]. Assuming low mutation rate, the inverse of the regression slope provides an estimate of the product 4π 2 [21, 26]. To test whether the regression slopes significantly differed between the different parts of a same scale, we used a Wilcoxon paired-sample test [71] applied on the 10 single locus values of the regression slope.

In the marine realm, the Euclidean distance between individuals might not be representative of the effective geographic distance separating them. Therefore, we computed an effective marine geographic distance between individuals using the least-cost path (LCP) algorithm implemented in the Pathmatrix extension [75] of the geographical information system software ArcView v.3.X (ESRI, Redlands, CA, USA). This algorithm computes a deterministic LCP between a source point and a target point by using a friction (or resistance) layer. The friction layer is a raster map where each cell (landscape unit) expresses the relative difficulty (or cost) of moving through that cell. A LCP minimises the sum of costs of all cells along the path (for detailed description and discussion of the algorithm, see [76]). In the present study, a uniform cost was attributed to all sea cells, while land cells harboured an "infinite" cost. This allowed us to compute effective distances avoiding landmasses. The sea/land map was obtained by rasterising (at 2 km resolution) a polygon version [77] of the GSHHS shoreline dataset v.1.3 [78]. The computations were performed using a gnomonic projection around the centroid of the sampled localities, which minimises the map deformation in planar distances induced by the curvature of the earth (Baird personal communication). Finally, the length of pairwise LCP (in meters) was introduced as the geographic distance matrix separating pairs of individuals in the regression analyses described above.

Migration rates among clusters

Evidence of recent migration events across clusters was assessed using the Bayesian multilocus genotyping procedure implemented with MCMC methods in BayesAss v.1.3 [22]. This approach does not require populations to be in either migration-drift or Hardy-Weinberg equilibrium. To examine the strength of the information in the porpoise microsatellite data set, 95% confidence intervals were determined for migration rates and compared to a scenario where all proposed changes throughout the Markov chain are accepted (thereby simulating the situation where any information that could exist in the data is insufficient to affect the posterior distribution of migration rates, as suggested by the authors). The MCMC was run for a total of 3 × 106 iterations, with the first 106 discarded as a burn-in to allow the chain to reach stationarity. Samples were collected every 2000 iterations to infer posterior probability distributions of parameters of interest.