The history of the European fallow deer (Dama dama) is complex. Palaeontological findings indicate that prior to the last glacial maximum, European fallow deer were spread throughout continental Europe (Chapman and Chapman 1980). However, during the last glacial maximum they are thought to have retreated into geographically separate refugia in south-eastern Europe (Turkey, the Balkans and possibly Italy; see Masseti and Vernesi 2015; Masseti 1996). Unlike other cervid species, such as roe and red deer, fallow deer did not naturally recolonise Northern and Western Europe from these locations (see Stuart 1991). Instead, today’s presence of fallow deer in this region, and indeed the wider world, is directly attributed to movement by man. This species has captivated humans since the Neolithic; consequently, fallow deer represent the most well-travelled of all cervid species. In a paper assessed by Marchesini et al. (2020) in this volume, Baker et al. (2017) report on the population and conservation genetics of European fallow deer, including inference on the relative importance of natural and anthropogenic processes. Baker et al. (2017) describe their aim as drawing ‘inference about effective conservation strategies’ in the context of these processes. Marchesini et al. (2020) instead focus on testing the signature from glacial refugia on modern population structure in this species, as reflected in their title where they state that there is ‘no genetic signature of glacial refugia’. Baker et al. (2017) do address this question, and suggest a possible signature, but with the caveat that “…the origin of the Iberian lineage and the inclusive identification of refugia remain open questions. Ancient DNA, especially from the Balkan regions, may help resolve this…”.

In their review of Baker et al. (2017), Marchesini et al. (2020) re-analyse various aspects of the Baker et al. (2017) study, however in doing so they incorrectly applied some methodologies leading to conclusions that have the potential to falsely weaken the conservation message from the earlier study. That is why we feel compelled to respond to their paper here. The key points of conservation importance from Baker et al. (2017; as indicated in their conclusions) are that there is strong population structure across the species range in Europe, and that within-population diversity is extremely low, despite the IUCN classification of this species and these populations as being of ‘least concern’ (Masseti and Mertzanidou 2008). Marchesini et al. (2020) take issue with sampling strategies, and suggest that duplicate samples and those from close kin were included by Baker et al. (2017), while the strategy employed to avoid these issues is summarised in Baker et al.’s Table 1 and Fig. 1. Marchesini et al. (2020) claim to find 48 genotypes that are shared among individuals (from 2 up to 10 individuals). Presumably this is based on 8 loci (see below; not specified in Marchesini et al. 2020), because for the 10 locus dataset there are 32 genotypes shared among individuals (1 shared among 4, 3 among 3, and the rest are pairs). It is not clear how Marchesini et al. (2020) are interpreting matching genotypes, though at one stage they do call them ‘duplicates’. However, most of these are from separate locations and further metadata (sex, collection date, etc.; unpublished, but available on request) excludes them from being credible duplicates. The level of replicate genotypes is also not unexpected given the low level of diversity within putative populations. Taking the Canadian sample set as an example (for which there are no missing data, and also no information to help distinguish individuals), there are 5 pairs of replicates. Based on the genotype frequency data, it is possible to calculate the probability that for this sample set these 5 replicates may occur by chance. Doing that exercise shows that the observed pattern is not significantly different from that expected by chance (χ2 = 0.22; p = 0.99). Repeating this for our largest sample set (England) shows that the 2 matches seen are not significantly more than the 1 expected by chance (χ2 = 0.33; p = 0.56; based on identity probabilities calculated in CERVUS; Marshall et al. 1998). Therefore, the replicates can evidently be explained by low diversity, and their removal would be inappropriate for any further analyses (as it would artificially inflate allele frequencies; see below). Even so, Marchesini et al. (2020) state that “96/358 individuals (27%) were therefore removed (duplicates) for the parentage analysis.”

Table 1 Mean and standard deviation for heterozygosity when 10 or 8 loci included, compared against values reported in Marchesini et al. (2020) Data calculated using Arlequin version 3.5.2 (http://cmpg.unibe.ch/software/arlequin35/)
Fig. 1
figure 1

a Structure (https://web.stanford.edu/group/pritchardlab/structure.html) plot for K = 10 from Baker et al. (2017) based on 10 loci, and b the K = 10 plot from Marchesini et al. (2020) based on 8 loci (the version represented in 16 of their 20 replicate runs)

Their parentage analysis was run in COLONY (Jones and Wang 2010) however, they “decided to perform COLONY analysis on the whole dataset (and not on separate populations). Following this decision, we acknowledge the risk of detecting false siblings (particularly among individuals from different populations)”. They justify this by citing recommendations in Rodríguez-Ramilo and Wang (2012). However, Rodríguez-Ramilo and Wang (2012) specifically investigated the impact of the inclusion of close kin on the outcome of analyses in STRUCTURE (Pritchard et al. 2000). Rodríguez-Ramilo and Wang (2012) observe that “COLONY may detect some false siblings, especially when individuals are taken from several highly differentiated populations”, and note that this is expected because “COLONY assumes a sample of individuals taken from a single subpopulation.” They state that identifying false siblings is especially a problem within subpopulations (not between populations as suggested by Marchesini et al. 2020): “When sampled from several well-differentiated subpopulations, nonsibs from within a subpopulation are still related (by an amount equivalent to FST) and could thus be mis-assigned as siblings.” They justify the risk for analysis in Structure in particular by saying that “removing falsely detected siblings merely reduces sample size”. In Baker et al. (2017) many of the microsatellite FST values comparing regions are high (see Table S3 in Baker et al. 2017), up to 0.745, which can be explained due to low variation within and fixed differences between populations. This means that pooling populations for the reference allele frequencies would inflate their diversity and provide false confidence for kin group assignments within populations. At the same time, using a within-population estimate, as appropriate, would provide too little power for the meaningful, significant identification of kin groups. The analyses associated with COLONY presented in Marchesini et al. (2020) are therefore invalid.

Marchesini et al. (2020) also remove loci for some re-analyses. From the original dataset from all loci by population comparisons there are 6 cases of significant heterozygote deficiency (after correcting for type 1 errors), one in Rhodes, England, Italy and Sweden, and two in Ireland. For each of six loci (OarFCB48, HAUT27, CSSM014, ETH2, ILST30, and TGLA127) there is one case of significant heterozygote deficiency. Among the two loci identified for removal by Marchesini et al. (2020), HAUT27 shows deficiency in Ireland, and CSSM014 in Sweden (both locations where a Wahlund effect due to mixed introductions from separate sources is possible). This pattern of deviation from HWE would typically not be sufficient to justify removing loci or population sample sets, since there is no consistent pattern, and the impact for each putative population is zero or 1 locus out of 10. Marchesini et al. (2020) also justify the removal of HAUT27 due to missing data, however at 6% this is not excessively high (range across loci is from 0.5% to 6%). They justify the removal of CSSM014 due to null alleles, but Baker et al. (2017) used Microchecker and do not report finding nulls. Marchesini et al. (2020) apparently used FreeNA to detect null alleles. The different outcome may be because Microchecker requires homozygote excess to be homogeneously distributed across all homozygote classes for nulls to be identified, and from the Dryad file, it can be seen that this is generally not the case for this dataset. More generally, although all methods work reasonably well, each produce some errors (especially false negatives), and the outcome varies among programs (see Dabrowski et al. 2014). Baker et al. (2017) report linkage disequilibrium (LD) between HAUT27 and BM4505, as do Marchesini et al. (2020), but the appropriate test is then to remove each locus in turn, to see if the coordinated genotypes identified in the LD analysis distorts the outcome of further analyses, as was done in Baker et al. (2017).

Modifications to the study based on these manipulations matter if they artificially distort the pattern and level of diversity. In their Table 1, Marchesini et al. (2020) present diversity metrics re-calculated for by-country values (broken down by region in Baker et al. 2017). They do not make clear if the proposed duplicates or kin have been removed (which would artificially inflate values) or if the calculations were based on 10 or 8 loci. In our Table 1, we compare the mean values provided in Marchesini et al. (2020) with mean and s.d. for 10 loci and 8 loci (omitting HAUT27 and CSSM014) from Baker et al. (2017). It is evident that Marchesini et al. (2020) used 8 loci, but retained all of the samples they had proposed to be duplicates or kin. The standard deviations illustrate the extent in variation among loci (not previously presented in either paper) and the consistently higher averages at 10 loci (apart from Rhodes) show that the loci omitted had relatively high diversity. However, FST comparisons using 8 (omitting HAUT27 and CSSM014) or 10 loci were highly correlated (linear regression: R2 = 0.985, F = 3427, p < 0.0001).

Marchesini et al. (2020) test the outcome of Structure analyses with HAUT27 and CSSM014 either included or removed and report little difference (“very similar outcome for K = 3 and inconsistency of results for K > 3 (data not shown)”). Again, they do not make clear if their analyses in Structure retained or removed the proposed duplicate samples or close kin. As mentioned, this modification in Structure was the point of the analysis in Rodríguez-Ramilo and Wang (2012) which they cite as inspiration for their re-analysis. The comparison in Fig. 1 shows the Structure plot for K = 10 from Baker et al. (2017) and the K = 10 plot from Marchesini et al. (2020) (the version represented in 16 of their 20 replicate runs). Note that these are essentially identical, though the plot in part b is for 8 rather than 10 loci. The comparison also indicates that putative duplicate/kin samples have not been removed, as the number of samples included is apparently the same. Taken together, these various assessments do not appear to justify removing these 2 loci from the broader analysis. The dominant pattern found by Marchesini et al. (2020) when K = 10 or when K = 5 (found in 13/20 replicate runs) only reinforces the coherence within and difference between Iberia, Italy and Turkey illustrated in Baker et al. (2017) from both Structure and ordination analyses (see Baker et al. 2017). The clustering of northern populations could reflect a similar pattern of admixture across that geographic range. Marchesini et al. (2020) do not mention the ordination analyses from Baker et al. (2017), but these provide the stronger inference since no assumptions need to be made about population equilibrium conditions.

In the end, the re-analyses by Marchesini et al. (2020) based on the inappropriate removal of loci and samples does not actually change the conservation-relevant results from Baker et al. (2017), the strong differentiation between regions, and the very low level of diversity within regions. The pattern of structure is strong and robust to these manipulations. In future it would be useful to study these populations using many more loci (e.g. from RADseq analyses), however it seems that 8 vs 10 loci has not made a significant difference, nor is it likely that increased resolution would change inference about strong population structure and comparatively low diversity within regions.

In their discussion, Marchesini et al. (2020) say that the “loci showed extremely low overall levels of polymorphism, even lower than those recorded for a single-population study, at 4 shared markers (Say et al. 2003): this may indicate that this dataset might be affected by some further biases such as improper sampling strategy and possibly high inbreeding (but see below). Besides this, such a very low informative panel of markers, with the majority of loci being monomorphic/dimorphic within populations, poses serious questions on the reliability of any evolutionary inference based on the derived population genetic results”. These conclusions from a comparison with Say et al. (2003) are unconvincing. The deer in that study were from Phoenix Park near Dublin, Ireland, founded with deer from the UK in 1662, followed by various introductions up to 1906. There was a bottleneck apparently during WWII, but the population quickly recovered to > 600 today. Therefore, this population, as for all northern populations, is a mixed assemblage of introduced deer from various regions, including Britain (which is itself a mixed assemblage). Their sample size was 349 and they found 21 alleles at the shared loci. Baker et al. (2017) had a sample size of 364 from multiple origins and 15 alleles at the shared loci. The difference is not significant (χ2 = 1.21; p = 0.272). In Baker et al. there are 9 alleles at the 4 shared loci from their Irish sample, but the sample size is an order of magnitude smaller, and so will be unlikely to have sampled the rarer alleles captured by Say et al. (2003). The number of alleles at the 10 loci in Baker et al. (2017) range from 2 to 7, while for Say et al. (2003) the range across their 20 loci is also 2 to 7 alleles. In general, there is not a compelling case that the diversity found in Say et al. (2003) is significantly greater than that found in Baker et al. (2017). So, there is no need to propose some sort of unidentified bias. However, the loci are nevertheless informative because of the extent of diversity among putative populations, and there is no reason to think that low diversity on its own would bias evolutionary inference, though without the strong divergence among populations it will limit the resolution of those comparisons.

Another issue raised by Marchesini et al. (2020) was the completeness of the mtDNA network. Baker et al. (2017) based their network on a 683-bp control region sequence and the extensive sample set collected for their study, together with 9 representative German samples that had been previously analysed in Ludwig et al. (2012), re-sequenced to match the 683-bp length. Marchesini et al. (2020) include a network based on more samples, but fewer informative sites (417 bp). The relative increase in sample size was achieved through the collation of freely available Genbank data (Baker et al., 2017; Ludwig et al. 2012), alongside their own data derived from Masseti et al. 2008 which was not made publicly available on Genbank until 2020 (and so not available to Baker et al. in 2017). It is of interest to see the new network, however, by reducing the fragment length of the consensus sequences Marchesini et al. (2020) miss important aspects of the data, including 13 informative sites and a 21-bp indel that is uniquely shared between Turkish and nearly all northern European introduced populations. The network they present is also lacking information on the relative frequency of each haplotype. In the end, the 2 networks provide very similar inference. Both show essentially 5 clusters and separate out most haplotypes from Rhodes and Turkey, and show some branch sharing between Italy and Iberia. Both also show shared haplotypes (the most common haplotypes as can be seen in Baker et al. 2017) between southern populations (Italy and Iberia) and the northern and foreign introduced populations (Canada, Northern and Central Europe). The separate ‘northern’ cluster likely reflects differential matriline extinctions and lineage sorting among the ancestral source populations and founder populations in the north. The larger number of samples increased the number of haplotypes, but at the shorter sequence length. The difference between their clade 2 (comprising 2 Italian, 1 Rhodian and 1 northern haplotypes) and our version (just one Italian haplotype) is interesting, but just how different would depend on haplotype frequency, which is not shown in their network. Therefore, rather than revealing some insight missed in the Baker et al. (2017) version, the Marchesini et al. (2020) network presents a less robust representation per haplotype, with more haplotypes, but no novel inference. Furthermore, possible interpretation needs to be considered in the context of a single gene tree, especially given likely issues with incomplete lineage sorting, and possible introgression (as discussed in Baker et al. 2017).

The point raised in Marchesini et al. (2020) that was central to their concerns was the possibility of influence from glacial refugia on the pattern of genetic diversity among modern populations. This is an interesting question, though Baker et al. (2017) make no claims to have fully resolved it. Marchesini et al. (2020) propose a test based on the expectation that refugial populations will be more diverse than post-expansion populations (see reference to the glacial refugia hypothesis), potentially affected by founder events and leading edge effects. However, there is no evidence for a post-glacial expansion into the north for this species. Instead, anthropogenic introductions moved these animals into the north generating mixed assemblages that may well be more diverse, or at least as diverse as their source populations. The test is therefore invalid. Both papers show that northern populations harbour higher genetic diversity than the last, undisputed autochthonous population at Düzlerçami in Turkey. Note however that the Düzlerçami population has faced near extirpation and numbers fewer than 100 individuals (Arslangündoğdu et al. 2010) which will have reduced variation. It is possible that in northern European populations, humans have unknowingly preserved some of the original genetic diversity of the source population, as was argued for Rhodian deer in Masseti et al. (2008).

Ordination analyses presented in Baker et al. (2017) showed that introduced populations in northern Europe, UK, Ireland, Sweden and Canada cluster together, while Iberia, Italy, Turkey and Rhodes form distinct clusters (e.g. Figure 5 in Baker et al. 2017). This pattern can also be seen in the Structure analyses, with extensive apparent admixture among most introduced populations in the north or overseas. Baker et al. (2017) observe that these strong clusters in the south reflect the geographic structures proposed as refugia for many European species during the Pleistocene glaciations (e.g. Hewitt 2000). Furthermore, although sample sites in Iberia are spread across the region (as far apart as 1,200 km), these all cluster together genetically, and separate from the populations in the other peninsulas (Italy and Anatolia), as seen in the Structure and ordination plots. Here, we analyse this further using an AMOVA where all regions were treated as a single population apart from Iberia which was subdivided into 5 regional subgroups (see Table 1 and Fig. 1 in Baker et al. 2017 for group designation). The results are presented in Table 2. Overall FST was high (0.53; as expected from pairwise FST values shown in Baker et al. 2017), while differentiation among subpopulations within groups (representing differentiation among the regions around the Iberian peninsula) was much lower (FSC = 0.018; Table 2). This is despite the fact that some populations in Iberia are actually farther apart than proximate populations in Iberia and Italy.

Table 2 AMOVA analysis for groups defined by country, and populations within the Iberian region. Analysis run in Arlequin version 3.5.2

It is not immediately clear why this pattern would exist if all modern regional populations were the result of anthropogenic translocations. We understand the difficulties with a glacial refugial interpretation from the zooarchaeological data, which is well reviewed in Marchesini et al.’s appendix S1. We took the clear differentiation between putative populations (see Fig. 2 for further illustration) and low diversity within them as support for a hypothesis about glacial refugial signatures to further test. We assessed the best supported model of division among these populations (including all possible combinations and only using samples with no missing data) in an approximate Bayesian computational (ABC) analysis. Marchesini et al. (2020) say that this is “a crucial point of Baker et al. (2017)”. However, it is not—the crucial point is the pattern of differentiation among the southern populations. While the model most strongly supported by the ABC analysis is a trifurcation with a mean division point within the glacial period (see analysis details and supporting illustrations in the supplement to Baker et al. 2017), Baker et al. (2017) do not propose that this, on its own, can confirm that a glacial refugial signal is the only explanation. As they explain, further analyses are needed, and especially recommend an analysis incorporating ancient DNA. We have now done this (Baker et al. in prep). The paper is not yet published, however the data presented do confirm the establishment of at least 2 distinct lineages likely during the end of the Pleistocene period, and these lineages represent clustered ancient and modern samples differentiating Italy and Anatolia in the earlier division, and Iberia from Italy possibly somewhat later.

Fig. 2
figure 2

Un-rooted neighbour-joining phylogeny based on 10 microsatellite DNA loci and Nei’s Ds genetic distances, with 100 bootstrap replications (> 50% node support shown) constructed using Phylip (https://evolution.genetics.washington.edu/phylip.html)

The idea of refugial fallow deer populations is not new, and has been previously promoted by some of the authors of Marchesini et al. (2020). For example, fallow deer are proposed to have taken refuge in 2 main locations in south-eastern Europe: Turkey and the Balkans (Masseti and Vernesi 2015; Karastoyanova et al. 2020). This is consistent with the phylogeographic patterns of other species native to this region where the Bosphorus strait acts as a biogeographic barrier between these 2 areas (see Bilgin 2011 and references therein). An argument for an Iberian refugia has not been previously made, as the archaeological appearance of fallow deer in Spain strictly coincided with the arrival of the Romans (Davis and MacKinnon, 2009) who began the widespread movement of this species across continental Europe (Pascal et al. 2006). Baker et al. (2017) acknowledge that the signal in this region may represent an early human-mediated translocation from Italy, or perhaps from a now extinct refugial source (possibly in the Balkans). Overall, we feel that the possible integration of genetic signatures from natural historical and anthropogenic processes remains worth further investigation for this species. We emphasise however that the key purpose and message of the Baker et al. (2017) study was to illustrate the pattern of variation and diversity among extant populations, and the conservation implications of those data. Nothing in Marchesini et al. (2020) diminished or altered that inference. There are parallels with large African mammals which are also often translocated and their populations fragmented (e.g. Benjamin-Fink & Reilly 2017). However, we have manipulated these populations, their survival continues to depend on the retention of sufficient diversity to avoid inbreeding depression and to permit adaptation to changing environments (e.g. Frankham 2005).