In the 1980’s, Alterra started the collection of detailed data on the occurrence and distribution of pine martens in the Netherlands (e.g. Müskens and Broekhuizen 1986). From 1991 onwards, this survey was intensified in collaboration with the Dutch Pine marten Working Group (e.g. Thissen et al. 2010), and starting from 1992 it included the collection of tissue samples from dead individuals (mainly traffic casualties). Between 1992 and 2010, a total of 251 dead individuals were collected throughout the country (Fig. 1) and delivered to Alterra, where a small sample of tongue tissue was removed for DNA extraction during post-mortem investigation. Samples of 127 individuals were collected from 1992 to 2000, mainly in the four core habitat areas (see Fig. S2 of the Online Electronic Materials). The remaining 124 tissue samples were collected from 2001 to 2010 and included samples from other parts of the country where numbers of pine martens had been increasing, such as the coastal area (see Figure S3). Additionally, 66 faecal samples were collected between April 2009 and May 2010 in the Wieden-Weerribben swamp forest. Samples were directly submerged in 96 % ethanol and stored at −21 °C in the lab.
The DNeasy Blood and Tissue kit and the QIAamp DNA Stool Mini Kit (both QIAGEN) were used to extract DNA from tissue and faecal samples, respectively. Amplification was performed for nine microsatellite loci: GG454 (Walker et al. 2001), MA1, MA2 and MA4 (Davis and Strobeck 1998), MEL1 and MEL6 (Bijlsma et al. 2000), MEL10 (Domingo-Roura 2002), MVI20 and MVI57 (O’Connell et al. 1996). For DNA extracts from faeces, a tenth locus (DBY7Ggu, located on the Y-chromosome) was amplified for sex determination. After prior tests during a pilot study, optimized PCR reactions were performed in volumes of 10 µl, including 0.3 Units of Taq (Invitrogen Taq DNA polymerase), amounts of PCR buffer and W-1 according to the Invitrogen protocol, 100 nM of both primers, 200 µM of each dNTP, 3 mM MgCl2 and 320 µg/ml BSA. 2 µl of DNA-template was used per reaction, containing between 4 and 40 ng DNA in case of extracts from tissues. The following PCR programme was used for tissue samples: 94 °C for 2 min, 30 cycles at 94 °C for 30 s, 56–64 °C for 30 s, 72 °C for 30 s and a final extension step of 1 min at 72 °C. For faecal samples, a different programme was used: 94 °C for 2 min, 36 cycles at 94 °C for 30 s, 56–64 °C for 30 s, 72 °C for 1 min and a final extension step of 20 min at 72 °C. Forward primers were labelled with either IRD-700 or IRD-800 for analysis on a Li-Cor 4300 platform. Samples were first amplified for the locus MEL10, which can be used to distinguish pine martens from stone martens (Martes fiona; Pilot et al. 2007), and samples that did not amplify or that were identified as stone marten were discarded. Genetic profiles of tissue samples were based on a single PCR-reaction per locus. Genotypes with missing data for >1 locus were considered unreliable and were discarded from the data analysis. For faecal samples, a multi-tube approach with three independent PCR reactions per locus per sample was used to minimize genotyping errors (allelic dropout and false alleles). In accordance with the approach described in Koelewijn et al. (2010), we first amplified a single locus (MEL10) for all samples. The other loci were only amplified for samples that showed three times the same genotype for MEL10. Any samples that did not show the same genotype at all loci were considered unreliable and were discarded from the data analysis.
Tests for genotypic linkage disequilibrium in the total dataset were performed for all possible pairs of loci using a log-likelihood ratio G-test and Bonferroni correction in FStat (v.2.9.3; Goudet 1995). The same software was applied to test for deviations from Hardy–Weinberg equilibrium (HWE) per locus among samples. We used MicroChecker v.2.2.3 (Van Oosterhout et al. 2004) to check for null alleles, genotyping errors and allelic dropout per locus.
Inference of population structure
Although the distribution of pine martens in the Netherlands centres around a few core habitats, sampling localities were scattered all over the country (Fig. 1). No detailed data were available on the occurrence of reproduction outside the core habitats. As one of the primary goals of this study was to assess the presence of geographical barriers separating different mating groups, we chose not to subdivide the dataset into a priori defined populations. Rather, we applied individual-based Bayesian clustering methods to define the number of populations and to assign each individual to a population, and afterwards assessed levels of genetic variation within and gene flow between the inferred populations (Ball et al. 2010). Two complementary Bayesian clustering programs were applied. We first used STRUCTURE (v.2.3.3; Pritchard et al. 2000), a program that estimates the probability Pr (X|K) of the data to conform to a predefined number populations (K) and estimates per individual the probability q to belong to each of the clusters. We ran STRUCTURE for K = 1 to K = 10, under the admixture model. Although using a model with correlated allele frequencies would be in line with the hypothesized subdivision due to habitat fragmentation, we could not exclude the possibility that a substructure was already present before this fragmentation took place, and therefore repeated the analysis assuming uncorrelated allele frequencies. Runs were replicated five times for each K value, using 500.000 MCMC iterations and a 50.000 iterations burn-in period. We then determined ∆K (Evanno et al. 2005) using STRUCTURE HARVESTER (Earl and vonHoldt 2011) to infer the optimal number of clusters. Secondly, we applied GENELAND (v.4.0.0; Guillot et al. 2008), again testing both the correlated and uncorrelated allele frequency models, while incorporating spatial data. 10 replicate runs were performed (settings: 500.000 iterations per run, burn-in period of 50.000, thinning rate of 100, maximum number of nuclei of 300, and assuming individual admixture and filtering for null alleles). K was defined based on the result of run with the highest average log posterior density (ALPP; Guillot et al. 2012), and results for the five best runs with equal K-value were selected for further analysis.
For each model, we tested the repeatability of the cluster assignments and matched the results of the five replicate runs via CLUMPP (Jakobsson and Rosenberg 2007). Per individual, the modal cluster assigment was based on the population with the highest q-value in the analysis.
The inference of the number of clusters via the ∆K method may only detect the uppermost level of hierarchical structure (Evanno et al. 2005). We therefore applied a hierarchical framework, in which the dataset was split along the K detected clusters, and then the same Bayesian clustering analyses were repeated on each of the clusters to test for any further substructure (Evanno et al. 2005).
Genetic differentiation between the core habitat areas was quantified as G
(Nei 1987). SMOGD (v.1.2.5; Crawford 2010) was used to calculate unbiased estimates of H
(Nei 1973) per locus. Values for both metrics were averaged across loci before calculating G
Principal component analysis (PCA), based on binary presence data per allele per locus, was applied in order to assess the level of variation in allelic composition among individuals in the four core habitat areas, as well as within and between the inferred clusters. The added value of this analysis is that it answers a different question than the Bayesian algorithms, as it groups individuals based on similarity in allelic variants rather than based on HWE (De Groot et al. 2012). Therefore, it allows assessment of the extent to which the clusters inferred via Bayesian algorithms contain different combinations of alleles instead of simply differing in allele frequencies. The analysis was conducted in PCord (v.6.0; McCune and Mefford 1999). Geographic X and Y coordinates were incorporated as quantitative variables in a secondary matrix to check for the effect of spatial distribution on genetic composition.
Influence of potential confounding factors
The presence of full siblings among the genotyped individuals was inferred via COLONY (Wang and Santure 2009), assuming male polygamy and female monogamy, a diploid dioecious species, inbreeding, and unknown population allele frequencies. We only accepted full-sib dyads for which the maximum likelihood analysis reported a probability of 1. Identified sibs were then removed prior to the Bayesian clustering analyses, as the presence of close relatives may result in ambiguous clustering output (Rodríguez-Ranilo and Wang 2012).
Furthermore, we are aware that sampling was performed over a time interval of 18 years (1992–2010), a period during which we know that the urbanization of the Dutch landscape has to some extent continued, and distribution patterns of pine martens may have changed (Thissen et al. 2010). To check for potential differences in the distribution of genetic variation over time, we split the dataset into subsets of samples collected before and after 1 January 2001. For both time periods, the suite of Bayesian clustering analyses described above was then repeated for both subsets to check whether the same clusters were obtained. We checked for a temporal change in spatial differentiation by calculating G
between the main geographical regions identified in the clustering analyses, based on the total dataset as well as for the subsets of samples collected before and after 1 January 2001.
Likewise, we sampled mainly from traffic casualties, thereby potentially oversampling dispersing individuals. Since male-biased dispersal has been observed in the Netherlands, we reanalysed the dataset while including either only females (36 % of the individuals) or only adults (52 % of the individuals). Both sets were again analysed using all four Bayesian clustering models.
Analyses of genetic variation
To check for a recent genetic bottleneck as a result of reduced population sizes in the 20th century, we tested for deviations from a normal L-shaped allele frequency distribution using the software package BOTTLENECK, assuming an I.A.M. mutation model (Cornuet and Luikart 1996).
Various measures of genetic diversity were calculated for the total set of samples, as well as for the inferred clusters. The mean number of alleles per locus (A) and the mean number of private alleles per locus (A
) were calculated by hand. Allelic richness (A-
) was calculated by means of rarefaction in Fstat. Observed heterozygosity (Ho) was calculated in Genepop (Raymond and Rousset 1995). Expected heterozygosity (H
; Nei 1987), and the fixation index F
were calculated using Fstat. Randomization tests were applied to test for significant deviation from HWE within clusters. We assessed levels of isolation by distance (IBD), by performing an individual-based analysis in Genepop. We tested for a difference in Ar between the first and second decade of sampling via a paired-sample t test in SPSS (v.22; IBM Corp.), using markers are replicates.