Background

Landscape-scale anthropogenic disturbance can cause habitat loss and fragmentation, thereby spatially isolating local wildlife populations and impeding functional connectivity [1,2,3,4]. Species that typically structure as metapopulations may be particularly threatened by spatial isolation of subpopulations. Local extirpations at patches can be common, and persistence of the metapopulation is dependent on ongoing recolonization events [5,6,7]. As such, interrupted dispersal and gene flow among habitat sites can decrease population-wide genetic variability and fitness, promote extirpation of naturally small subpopulations, prevent recolonization events and threaten metapopulation persistence [4, 5, 8, 9]. Allegheny woodrats (Neotoma magister) require rock habitats (e.g., cliff faces, talus slopes, boulder fields), located primarily in high elevation areas throughout the Appalachian Mountains and Interior Highlands [10, 11]. These habitat sites are naturally disjunct. As a result, woodrats typically form small subpopulations (defined by suitable rock habitat) that are connected by dispersal within a larger metapopulation [12, 13]. If movement amongst habitat patches is interrupted, subpopulations become isolated, gene flow is inhibited, genetic diversity is lost through drift, inbreeding depression occurs, population numbers decrease, and recolonization of extirpated sites declines [14,15,16].

New Jersey (NJ) is home to a single remnant population of Allegheny woodrats, located ~ 240 km from the nearest extant population in Pennsylvania. Individuals sampled in 2009, 2011, and 2012 and genotyped at 11 microsatellite loci had relatively low genetic variability, as indicated by allelic diversity and observed heterozygosity (Additional file 1). In response to conservation concerns associated with declining genetic diversity, New Jersey DEP Fish and Wildlife introduced six individuals from a genetically robust population in Pennsylvania in 2015, 2016, and 2017 under the assumption that, if translocated individuals reproduced, population numbers and genetic variability would increase (i.e., genetic rescue; [17,18,19,20]). However, identifying evidence of reproductive success and quantifying genetic variability are dependent on identifying a marker panel with suitable statistical power [21,22,23].

Panels of single-nucleotide polymorphisms (SNPs) can yield a low probability of identity (PID) (i.e., the likelihood that two randomly chosen individuals in a population will present seemingly identical genotypes; [21]), thus aiding accurate reconstruction of familial relationships [24,25,26], even when populations are inbred [23]. Even relatively small SNP panels (e.g., 58–109 markers) can ultimately perform as well or better than small suites of microsatellites [23, 27,28,29,30]. To this end, we sequenced the Allegheny woodrat genome and annotated a draft genome assembly. We subsequently designed a 134 SNP panel incorporating both gene-associated and putatively neutral markers. We conducted preliminary analyses to explore whether the SNP assay provides greater statistical power for individual identification than a commonly used panel of microsatellite markers. The SNP loci were then used to evaluate changes in genetic diversity following translocations to New Jersey’s remnant population and identify offspring of translocated individuals.

Results

Nuclear genome sequencing and SNPtype assay development

We generated 137.6 gigabases (Gb) of raw sequence data from N. magister, including 119.8 Gb from the paired-end (PE) library and 17.8 Gb from the mate-paired (MP) library (Additional File 2). Our draft nuclear genome assembly includes 60,789 scaffolds greater than 2000 basepairs (bp) in length. We used BUSCO v5 to evaluate completeness of the genome by identifying mammalia_odb10 orthologs, finding 77.9% of universal single-copy orthologs were complete (77.2% single copy, 0.7% duplicated), 8.7% fragmented and 13.4% missing.

We initially identified 627,421 high-quality SNPs. Of these, we selected 192 SNPs to include in a Fluidigm SNPtype assay. We subsequently excluded 58 loci for reasons outlined in the methods (e.g., data did not cluster into distinct homozygous and heterozygous states). Of the remaining 134 loci, at least 128 loci amplified for each of the 318 woodrats genotyped (Additional File 3). These loci were roughly divided between gene-associated (n = 72) and neutral markers (n = 62).

Probability of identity using microsatellite and SNP markers

Genotyping 50 woodrats captured in 2017 and 2019 in Adams County, Ohio (OH) at 11 microsatellite markers resulted in a PID of 4.0 × 10–5 and a probability of identity among siblings (PIDsib) of 9.8 × 10–3. By contrast, 134 SNP loci generate values of 5.0 × 10–27 (PID) and 3.1 × 10–14 (PIDsib). If the more conservative data set of 70 loci is used, the PID is 1.9 × 10–13 and the PIDsib is 3.1 × 10–7 across all 50 individuals. Furthermore, our results indicate that a much smaller panel of SNPs might be utilized in subsequent studies to achieve a PID < 0.0001 (Additional File 4; a PID < 0.0001 is considered low enough to distinguish between even closely related individuals in most natural populations [3133]). Given these results, all other samples were genotyped using just the SNP panel. Notably, there was a significant, positive relationship between the number of heterozygous microsatellite loci per individual and the number of heterozygous SNP loci per individual (linear regression: r2 = 0.32, p < 0.0001, Additional File 5).

Genetic variability and reproductive success following translocations to the Palisades population

Parentage analysis revealed that the six woodrats translocated from Pennsylvania to New Jersey produced a minimum of thirteen offspring (Table 1). The female translocated in 2015 produced at least four offspring between 2016 and 2019. The male translocated in 2015 produced at least nine offspring, predominantly in 2016 (Table 1). The offspring of the 2015 translocated female and male produced at least thirteen offspring of their own between 2017 and 2019 (Table 1). We found no evidence from trapping and subsequent genotyping that the other four translocated individuals reproduced. The males translocated in 2016 and 2017 were confirmed dead within 11 and 1 weeks of release, respectively. A camera captured footage of the female translocated in 2016 with a pup. It is unclear whether this pup died before reaching adulthood or simply avoided trapping, as the location in which it was photographed was outside of the regular trapping area. The female translocated in 2017 also settled outside of the regular trapping area and was not detected again.

Table 1 Offspring of individuals translocated from Pennsylvania in 2015, 2016 and 2017 and their offspring

Of the 82 tissue samples collected from the Palisades population, five were collected in 2009, thirteen in 2011, nine in 2015, eighteen in 2016, sixteen in 2017, eight in 2018 and thirteen in 2019. Once a Bonferroni correction was applied, a single locus was found to be out of Hardy–Weinberg equilibrium, and only in 2019 (exhibiting evidence of heterozygote excess). STRUCTURE analysis of resident individuals and those translocated to NJ provided evidence of two genetically distinct clusters when the most conservative data set (70 loci) was utilized. The population-wide genetic composition changed following translocation events in 2015, as illustrated by a shift from the blue cluster associated with the resident population prior to human-mediated gene flow, to an increase in the orange cluster associated with the genetic profiles of the PA individuals (Fig. 1). Despite this, all alleles historically present at the loci considered in this study were retained following translocations (data not shown).

Fig. 1
figure 1

STRUCTURE results for 82 woodrats trapped in the Palisades, NJ between 2009 and 2019, as well as six individuals translocated from PA to NJ. PA individuals are labeled with the years in which they were released in NJ (i.e., 2015, 2016 and 2017). All individuals are labeled as being sampled before translocations occurred (“pre-translocation”), during translocations or after translocations occurred (“post-translocation”). STRUCTURE results were CLUMPP-averaged across 10 runs when K is assumed to be equal to two. Admixture is indicated by a shift from the blue cluster associated with the resident population prior to human-mediated gene flow, to an increase in the orange cluster associated with the genetic profiles of the PA individuals

Prior to translocations, observed heterozygosity (HO) and expected heterozygosity (HE) were substantially lower in New Jersey than at sites in Indiana (IN) and Ohio (Table 2). However, genetic variability in the New Jersey population increased notably in the years following translocation (Table 2, Table 3, Fig. 2). For example, observed heterozygosity increased from 0.08 ± 0.02 in 2009 to 0.30 ± 0.02 in 2019 (Table 3, Fig. 2). Average HO and HE were comparable in Indiana, Ohio and (post-translocation) New Jersey (Tables 2, 3, Fig. 2).

Table 2 Mean observed heterozygosity (HO) ± SE, mean expected heterozygosity (HE) ± SE for Allegheny woodrats (Neotoma magister) genotyped at 134 SNPs
Table 3 Mean observed heterozygosity (HO) ± SE, mean expected heterozygosity (HE) ± SE and mean number of alleles (A) for Allegheny woodrats (Neotoma magister) captured in 2009 (n = 5), 2011 (n = 13), 2015 (n = 9), 2016 (n = 18), 2017 (n = 16), 2018 (n = 8) and 2019 (n = 13) in the Palisades, NJ and genotyped at 134 SNP loci
Fig. 2
figure 2

Capture index and mean observed heterozygosity (HO) ± SE for Allegheny woodrats (Neotoma magister) captured in 2009 (n = 5), 2011 (n = 13), 2015 (n = 9), 2016 (n = 18), 2017 (n = 16), 2018 (n = 8) and 2019 (n = 13) in the Palisades, NJ and genotyped at 134 SNP loci. Observed heterozygosity increased following translocations of six woodrats from Pennsylvania in 2015, 2016 and 2017

We identified 20 publications for which a Fluidigm SNPtype assay was used to genotype individuals at relatively few loci (38–192 SNPs) and HO and/or HE were reported (Table 4). Species described were members of Actinopterygii, Aves, Bivalvia, and Mammalia and predominantly considered of “Least concern” by the IUCN. Across studies, HO and HE ranged from 0.13 to 0.45 and 0.14 to 0.42; respectively (Table 4). The vast majority of HO and HE estimates for species characterized as “Least concern” (all but two) fell between 0.25 and 0.37. Median HO and HE were 0.32 and 0.31, respectively.

Table 4 Metrics of genetic variability, sample size and IUCN status for species genotyped using the Fluidigm® BioMark, HD™ Genotyping System

Discussion

Genome sequencing and SNP assay development

The data described herein represent only the second time a member of the genus Neotoma has undergone whole-genome sequencing [34]. Genetic resources for Neotoma magister are particularly limited [35], yet even low coverage sequencing can be used to generate tools that inform management of threatened species. For example, two lanes of paired-end sequencing and one lane of mate-paired sequencing enabled assembly of the complete mitochondrial genome [35] and identification of the 134 SNP loci described in this manuscript. Studies have shown that in some cases SNP genotyping can better reveal fine-scale population structure, provide evidence of differential selection amongst populations and estimate genome wide heterozygosity than other marker panels [36,37,38,39,40,44]. Genotyping 50 woodrats using both microsatellite and SNP loci indicates that our SNP assay provides increased statistical power for analyses. Furthermore, PIDsib estimates are also very low, indicating the panel can be used to distinguish between woodrats even when related individuals are present in the population [3133]. Given the spatial isolation of many woodrat populations (e.g., the remnant NJ Palisades population), the presence of closely related individuals should be assumed. Finally, DNA extracted from naturally shed feathers, hair and fecal samples and subsequent Fluidigm SNP genotyping has been used to identify individual golden eagles [24], wolves, wildcats, and bears [32, 33]. Given the low PID estimates associated with this assay, we anticipate a similar approach could be used to non-invasively monitor Allegheny woodrat populations from hair or fecal samples.

Temporal shifts in genetic variability following translocations to New Jersey’s remnant population

Conservation managers have long worried that translocations across extended geographic distances would result in relatively greater genetic distance amongst introduced and resident individuals, increasing the risk of outbreeding depression [45]. Recent studies, however, indicate that outbreeding depression rarely has negative impacts on the success of translocation programs [19, 46]. Furthermore, factors such as the genetic diversity of translocated individuals may be better predictors of fitness following introduction to a novel population than genetic distance [47]. Despite the relatively great geographic distance between source and resident populations inherent in this study, successful reproduction by translocated individuals clearly drove increases in genetic variability in subsequent years. Parentage analysis provides evidence that at least two woodrats translocated from Pennsylvania to New Jersey in 2015 went on to reproduce, as did their offspring. Increases in HO and HE were apparent as soon as 2016, making observed levels comparable to those among woodrat populations in IN and OH, and persisted through the end of the monitoring period in 2019. We also compared genetic variability in the NJ population to that of other species and determined that observed heterozygosity of New Jersey’s woodrats caught before 2015 was notably lower than any species listed as “Least concern” by the IUCN. Following translocations, HO and HE for the NJ population fall within the range of estimates generated across species. Increased abundance since 2015 provides additional evidence of potential genetic rescue. As such, this study joins relatively few in providing evidence of an increase in population size or growth rate following assisted gene flow (reviewed in [19]).

Management implications

Ongoing research on the conservation of Allegheny woodrats may inform best practices in translocating individuals to very small populations. Management guidelines recommend translocating a number of non-resident individuals that represent 20% of the recipient population to minimize the likelihood of swamping out local adaptive genetic variation [17, 20]. Genetic swamping (i.e., the rapid increase in frequency of alleles introduced by gene flow; [48, 49]) can result in the loss of private alleles within the recipient population. This, in turn, can lead to a loss of species-wide allelic diversity [49], even as the resident population’s genetic diversity increases. Efforts to minimize genetic swamping can lead to translocating very few individuals when recipient populations have low abundance. This study joins others in suggesting that successful reproduction by just one to three non-resident individuals can promote increased genetic diversity and abundance [50,51,52,53]. In some cases, however, these few immigrants achieve substantially elevated reproductive success in comparison to resident individuals, contributing to inbreeding in subsequent generations (e.g., arctic foxes, [51]; wolves, [54,55,56]). Even in the absence of direct observations of inbreeding, reproductive skew is known to decrease effective population size and result in the accelerated loss of genetic diversity due to drift [57]. Disproportionate reproductive success by translocated individuals may prove to be common in Allegheny woodrats if sex ratios are skewed in small populations [50], resident individuals prefer to mate with translocated individuals as an inbreeding avoidance mechanism [58], or F1 offspring have increased fitness stemming from heterosis [59,60,61]. Both this study and Davis et al., [50] document observations of translocated male Allegheny woodrats siring 39 and 35% of young trapped in the subsequent season, in their respective populations. A few known instances of inbreeding amongst relatives followed in subsequent generations (Table 2, [50]) but, encouragingly, coincided with stable or increasing population-wide genetic variability and abundance. Conservation managers monitoring small populations might consider genotyping individuals, conducting parentage analyses, and monitoring genetic diversity on a bi-yearly or yearly basis. This would allow for rapid translocation of additional individuals to small populations if elevated reproductive success seems likely to lead to inbreeding events, or, to supplement previous, unsuccessful attempts at genetic rescue (i.e., if non-resident survivorship is low). It is worth noting that even in the absence of inbreeding, truly isolated populations (like that of the Palisades) will ultimately require additional human-mediated gene flow to counteract loss of genetic diversity due to genetic drift.

There are additional ways in which ongoing studies of woodrat translocations have the potential to add depth to our understanding of genetic rescue and restoration. Studies of genetic rescue have typically considered populations as discrete units, uninterrupted by landscape features. The natural tendency for woodrats to exist in metapopulations give scientists the opportunity to study genetic rescue throughout heterogeneous landscapes and, in particular, how alleles introduced at one habitat patch have the potential to move amongst sites. Just as recent studies have proposed choosing specific individuals for their ability to reduce inbreeding depression in genetically depauperate populations [19, 62], specific habitat sites might be targeted for releases if they are connected by natural dispersal corridors to other portions of the metapopulation. Indeed, recent work on the landscape genetics of Virginia’s Allegheny woodrats suggests that low elevation, rather than anthropogenic barriers such as roads, might prevent translocated individuals and/or their offspring from dispersing amongst habitat sites [63].

Conclusions

Herein, we describe a novel SNP assay, which provides increased statistical power to studies of a species commonly found in small and consanguineous populations. Our study has important implications for remnant populations of threatened species that are geographically isolated from the nearest metapopulation. Translocating small numbers of individuals to very small populations may increase the risk of reproductive skew followed by genetic drift and inbreeding, necessitating increased monitoring following introductions. Despite this, human-mediated gene flow is likely to be integral to the persistence of remnant populations. Our results indicate, encouragingly, that small numbers of introduced, genetically variable individuals can successfully reproduce, increase population-wide genetic diversity, and facilitate increased abundance.

Methods

Genome assembly and annotation

We extracted deoxyribonucleic acid (DNA) from a tail clip of a single N. magister individual by pairing commercially available extraction (DNEasy Blood and Tissue, Qiagen, Venlo, the Netherlands) and clean-up (DNA clean & Concentrator, Zymo Research, Irvine, California) kits in accord with the manufacturers’ instructions. We conducted three lanes of paired-end and one lane of mate-paired sequencing using an Illumina HiSeq2500. We used Trimmomatic [64] to remove adaptors, discard short reads and trim poor quality bases from 5′ and 3′ ends of raw sequence reads as described in Schofield et al., [35]. We used ABySS 1.9.0 [65] to conduct several assemblies with kmer lengths ranging from 40 to 85. PE reads were used to generate contigs. MP reads were used to infer the order, orientation, and distance between contigs, linking them together in scaffolds. The assembly with the greatest N50 value and longest scaffold was used for downstream analyses. BUSCO v5 [66], implemented by gVolante [67], was used to evaluate completeness of the genome.

We used the MAKER 2.28 pipeline [68] to annotate all scaffolds greater than 10 kb, following the methods described in Doyle et al. [69] and Doyle et al. [26]. To briefly summarize, we first used Repeat-Masker to identify and mask stretches of repetitive DNA. Second, we downloaded 6762 Mus musculus protein sequences from the UniProtKB database (www.uniprot.org) and used the protein2genome setting in MAKER to generate gene annotations. These annotations were subsequently used to train SNAP [70] and generate ab initio predictions. Third, we aligned protein sequences and 93,400 Mus musculus expressed sequence tag (EST) sequences to the genome using BLAST and used InterProScan to identify putative protein domains. Finally, all ab initio gene predictions supported by protein, EST or InterProScan evidence were promoted to gene annotations.

SNP discovery and assay design

We identified SNPs as in Doyle et al. [26]. Briefly, we aligned paired-end reads back to the draft genome assembly using BWA 0.7.12 [71] and used Picard 2.3 (http://broadinstitute.github.io/picard) to sort and identify duplicate reads. We used the GATK 3.6 pipeline [72, 73] to realign reads around indels and identify high quality SNPs with a Phred quality score ≥ 30. We then selected 95 autosomal nuclear markers associated with gene deserts (i.e., “neutral” markers) and 97 autosomal nuclear markers associated with protein-coding genes. We deliberately chose no more than one SNP of each category from a given scaffold to minimize linkage disequilibrium. To identify neutral markers, we quantified the distances between all SNPs and genes using the BEDtools suite [74], ultimately choosing markers at the 95% percentile distance from genes. We used SnpEff 4.3 [75] to find SNPs associated with non-synonymous changes in the exonic regions of genes (i.e., “gene-associated” markers). IGV 2.3 [76] was used to confirm that at least 60 nucleotides of high-quality flanking sequence were present upstream and downstream of the marker, that guanine-cytosine (GC) content was less than 65%, and that no other variable sites were present within 20 nucleotides.

DNA extraction and SNP genotyping

We trapped and subsequently genotyped 82 woodrats sampled from the Palisades, NJ between 2009 and 2019 (Tables 2, 3), including 18 and 64 individuals sampled before and after translocations began, respectively. We followed standard live-trapping protocols [77, 78] and collected a 2-mm ear punch from each individual, which was preserved in 70–100% ethanol. For each year trapping occurred, we calculated a capture index by dividing the number of unique individuals caught by the number of trap nights and multiplying by 10 [79]. Calculating a trap index allows us to control for differences in the number of nights trapping occurred across years [77]. DNA extractions were performed using an ammonium acetate protocol [14] or the Zymo Quick-DNA Miniprep Plus Kit. We used the Fluidigm® BioMark HD™ Genotyping System to genotype these individuals. Additionally, we genotyped the six individuals translocated from Pennsylvania to New Jersey between 2015 and 2017. Finally, we opportunistically genotyped 172 and 58 samples collected from Indiana and Ohio, respectively (Table 2). These samples were collected between 2015 and 2019 as part of long-term monitoring studies.

SNP calls were edited using the Fluidigm® Genotyping Analysis Software. We excluded markers from downstream analysis when data did not cluster into distinct homozygous and heterozygous states and if minor allele frequencies were less than 0.025. We used chi-squared tests implemented by GenAlEx [24, 26] to test for departures from Hardy–Weinberg equilibrium. Following Bonferroni correction, a single locus was found to be consistently out of Hardy–Weinberg equilibrium across years and was omitted, leaving 134 loci. We excluded individuals from analysis if ≥ 7 loci were not successfully genotyped, as call rates tend to be negatively correlated with genotyping errors [24, 32]. Using snpStats [80], we identified a number of markers in linkage disequilibrium but assumed that in many cases this was due to consanguinity, rather than two markers being in close proximity along the genome. However, to meet the assumptions of Cervus 3.0.7 [81] and STRUCTURE 2.3.4 [82, 83], we identified all pair-wise comparisons with r2 > 0.2 and removed one marker in each case, creating a reduced dataset of 70 loci in which all SNPs are in linkage equilibrium.

Probability of identity using microsatellite and SNP markers

Previous studies of N. magister utilized relatively small panels of 11–22 microsatellite markers (e.g., [15, 63, 84, 85]). To evaluate the statistical power associated with each approach, we genotyped 50 woodrats captured in 2017 and 2019 in the Adams County, OH at both 11 microsatellites and 134 SNP loci (Additional File 3 and Additional File 4). We subsequently calculated the probability that two randomly chosen individuals in the population would have identical genotypes (PID), using each marker panel. We additionally calculated PIDsib, which represents a conservative upper bound for the likelihood that two individuals sampled from a population will have the same genotype by chance [32, 33]. This estimate is particularly useful when substructure is present in the population (i.e., related individuals; [32, 33]).

Reproductive success and genetic variability following translocations to the Palisades

We used Cervus 3.0.7 [81] to assign individuals sampled between 2015 and 2019 to dams and sires. For individuals trapped for the first time in each year, all woodrats trapped in that same year and in all previous years were considered candidate parents. Simulations included 100,000 replicate cycles. The proportion of candidate dams and sires sampled was estimated to be 0.80, based on the probability of capture estimated from comparable trapping approaches of other woodrat populations [16]. The proportion of typed loci was 0.99 and the proportion of loci mistyped was set to 0.04 [26]. The minimum confidence level for parentage assignment was 95%.

STRUCTURE 2.3.4 [82, 83], STRUCTURE HARVESTER 0.6.94 [86], and Clumpak [87] were used to visualize admixture in the Palisades, NJ woodrat population across time. We utilized the reduced dataset (i.e., with all 70 loci in linkage equilibrium, see above) for 82 individuals sampled between 2009 and 2019, as well as the six individuals translocated from PA. We considered values of K = 1–8, running each value 10 times with an initial burn-in of 100,000 Markov chain Monte Carlo (MCMC) iterations and 1,000,000 subsequent iterations for each value. We assumed an admixture ancestry model and allowed for correlated allele frequencies [82]. The results were interpreted using mean likelihood values of K and ΔK [86].

We used GenAlEx [88] to calculate allele frequencies and expected and observed heterozygosity in the years before (2009, 2011), during (2015–2017) and after translocations (2018, 2019) to the Palisades, NJ population. To provide context for our interpretation of temporal changes in genetic variability in the Palisades population, we (1) used GenAlEx [88] to calculate allele frequencies and expected and observed heterozygosity in Indiana, New Jersey and Ohio and (2) surveyed the literature for estimates of observed and expected heterozygosity generated using the Fluidigm® BioMark HD™ Genotyping System and relatively small SNP assays (e.g., 96–192 loci). To conduct our literature review, we searched for the phrases “SNP type assay”, “SNPtype assay”, “Fluidigm SNP assay”, “Fluidigm SNP chip” and “Fluidigm Genotyping Analysis Software” in Google Scholar. For all studies of non-human animals for which observed and/or expected heterozygosity were described, we recorded the number of loci in the assay, sample size, metrics of genetic variability and International Union for Conservation of Nature (IUCN) status. If average HO and HE were not provided, we averaged across per-locus values or population-specific values when able. When interpreting these results, an important caveat is that we do not have a robust understanding of how SNP HE, HO and allelic diversity vary with categorizations such as body size, conservation status, habitat, migratory behavior, taxonomic group and trophic class. In contrast, these relationships have been extensively studied utilizing estimates of genetic variation generated with microsatellite loci [89,90,91,92,93].