Introduction

Assessing ungulate population size and density is of central importance for monitoring programmes and management decisions (Smart et al. 2004). Because many ungulate species show elusive behaviour and occur in habitats with dense vegetation, this is a challenging task for both rare (Barrio 2007; De Oliveira et al. 2019) and overabundant species, which can cause damages to silviculture (Putman and Moore 1998; Gordon et al. 2004; Allombert et al. 2005). Especially, management of red deer (Cervus elaphus) is of increasing interest because populations are spreading and increasing in many parts of Europe (Ward 2005; Milner et al. 2006; Mysterud et al. 2007; Acevedo et al. 2008). Although of high value for recreational, trophy or meat hunting, foresters complain of economic losses due to bark stripping or browsing effects of this large herbivore. However, red deer populations are particularly difficult to survey in forested areas because direct counts are difficult or yield only indices (Garel et al. 2010) and indirect methods like, e.g. pellet counts yield imprecise results (reviewed in Smart et al. 2004; Grignolio et al. 2020).

Non-invasive genetic approaches represent a powerful tool for population size and density estimation of animals that are elusive and therefore difficult to survey (Woods et al. 1999; Beja-Pereira et al. 2009; Ferreira et al. 2018; Skrbinšek et al. 2019). DNA contained in hair or faecal samples can be used for identification of individual animals, and these data can be used for population size estimation, e.g. implemented in a capture-recapture (CR) framework. Capture-recapture can yield accurate and precise population estimates (Otis et al. 1978; Seber 1982; Pollock et al. 1990), and non-invasive genetic CR offers the advantage that animals do not have to be captured physically, but are detected through their genotypes. This can reduce some sources of bias that are problematic in traditional CR, particularly those related to the behaviour of animals towards being captured like, e.g. ‘trap-shyness’ (McKelvey and Schwartz 2004; Petit and Valière 2006). However, several issues remain that can compromise non-invasive population size estimates and thus have to be carefully taken into account when conducting a study (Boulanger et al. 2004; Lukacs & Burnham 2005b; Ebert et al. 2010; Marucco et al. 2011). Detection probabilities may be heterogeneous amongst individuals or vary over time, and violation of the assumption of population closure can impede the definition of the effectively sampled area (ESA) relevant for the assessment of population density (Boulanger and McLellan 2001; Boulanger et al. 2004). Markers used to discriminate individuals have to be sufficiently informative; otherwise, a genotype may not represent a unique ‘mark’ (Pompanon et al. 2005). Genotyping errors such as allelic dropout or false alleles can lead to misidentification of individuals and thus to overestimation of population sizes (Prugh et al. 2005; Waits and Paetkau 2005). Therefore, care must be taken to reduce genotyping errors, e.g. by repeating analyses several times and by using effective error-checking protocols (Broquet and Petit 2004; McKelvey and Schwartz 2004).

Non-invasive population size estimation approaches have to date been applied for several ungulate species, amongst them different Odocoileus species (e.g. Brinkman et al. (2011) for Sitka black-tailed deer Odocoileus hemionus sitkensis in Alaska; Goode et al. (2014) and Brazeal et al. (2017) for mule deer O. hemionus and O. h. columbianus), argali sheep Ovis ammon (Harris et al. 2010), Sonoran pronghorn Antilocapra americana sonoriensis (Woodruff et al. 2016), roe deer Capreolus capreolus (Ebert et al. 2012) and mountain goats Oreamnos americanus (Poole et al. 2011). For red deer, a first evaluation of the technical background for non-invasive population monitoring was carried out in a pilot study (Valière et al. 2007). However, to our knowledge, the study presented here is the first successful attempt of applying this method on free-ranging red deer populations. We carried out non-invasive sampling of red deer populations in three different study areas in south-western Germany. All three areas are situated within large forest districts that are of high importance for silviculture and harbour large managed red deer populations. Browsing and bark stripping damages have been an ongoing issue, resulting in the need for an effective monitoring of the local red deer populations. As in many other areas in Germany, deer management goals are diverse, including regulating population size for damage control as well as meat and/or trophy hunting. Our aim was to investigate the applicability of the non-invasive genetic approach to monitor red deer, particularly with respect to sample size, genotyping reliability and potential sources of bias, but also to its costs.

Material and methods

Study areas and field sampling

We conducted our study in three mainly forested areas, the Hunsrück (HU, sampling in March 2012), Soonwald (SO, sampling in March 2015) and Pfälzerwald (PF, sampling in March 2016; Fig. 1), situated in the federal state of Rhineland-Palatinate in south-western Germany. Each study area approximates 10,000–11,500 ha in size (Table 1) and is located within a larger forested area continuously populated by red deer. Elevation within the areas ranges from approx. 200 to 700 m above sea level. Forest cover is lowest in the SO area (83%) and higher in the PF (93%) and HU (95%) areas, with Pinus sp., Fagus sylvatica, Picea abies and Quercus petraea as well as Quercus robur as dominant tree species. Annual average temperature is 8–9 °C; annual precipitation is approximately 600–1000 mm. Three ungulate species occur in the study areas: red deer, roe deer (C. capreolus) and wild boar (Sus scrofa). Red deer are hunted during drive hunts, involving approx. 30–50 shooters and beaters and often dogs, between mid-October and the end of January. Single hunt, as a sit-and-wait-strategy of a single hunter using a high seat, is carried out between May 1 and January 31; hunting season differs between sex and age classes. Due to official but unpublished harvest records (every successful hunt has to be reported to the local hunting services), the average yearly hunting bag sums up to 1.8 harvested red deer per km2 and year in the HU area, 3.1 per km2 and year in the SO area and 1.1 per km2 and year in the PF area (mean over 5–9 years preceding the respective faeces collection; for details, see Fig. 2). Hunting quotas are based on damage levels. In case of unacceptable damage levels (acceptable levels, e.g. winter browsing rate for terminal shoots in regenerated tree species < 15–20% or last-year bark stripping rate below 1%, are fixed by Rhineland-Palatinate hunting regulations, § 31 game law of the federal state), quotas are increased.

Fig. 1
figure 1

Locations and transect designs of the three study areas in which red deer faeces were collected for non-invasive genetic population density estimation (HU Hunsrück area, SO Soonwald area, PF Pfälzerwald area). Transects are given as thin black lines, bold outlines indicate buffered study area boundaries, and black dots indicate locations of collected red deer faeces

Table 1 Overview over the three red deer faecal sampling trials in the areas Hunsrück (HU), Soonwald (SO) and Pfälzerwald (PF)
Fig. 2
figure 2

Sum of the annually achieved hunting bags per km2 of hunting area in three study areas in south-western Germany (PF Pfälzerwald, HU Hunsrück, SO Soonwald). Hunting area is almost equal to forest area due to high forest coverage

Red deer faeces are easily distinguishable from wild boar faeces due to their differences in form, colour and texture. Roe deer faeces are similar to those of red deer in form and texture, but normally differ considerably in size and length/diameter ratio and thus can be distinguished in most cases (Mayle et al. 1999; Daniels 2006). However, a roe deer sample erroneously collected would be detected after genotyping because the microsatellites we used for red deer perform differently for roe deer: BM203, CSSM16 and BMC1009 show distinct alleles, whereas Haut14, IDVGA55, CSSM19 and TGLA53 do not amplify. AMELXY shows a different Y allele for male roe deer (236 bp) compared to red deer (228 bp).

We collected faeces during March after snowmelt, but before ground vegetation starts growing and with still rather low ambient temperatures in order to ensure adequate conditions concerning faecal detection and DNA quality (Lucchini et al. 2002; Maudet et al. 2004; Rea et al. 2016; Woodruff et al. 2015). In each area, transects were distributed as evenly as possible to represent different habitat types, microhabitats and altitudinal levels. Transects were spaced between 120 and 250 m from each other (Fig. 1). In the three areas, between 60 and 80 transects with lengths between 4.5 and 10 km were searched for red deer faeces, representing a total transect length of 2022 km (Table 1). The transect width, which a walking person could effectively search for faeces, was approximately 1.5 to 3 m depending on ground vegetation and slope of the transect.

The locations of all detected red deer faeces were recorded using handheld GPS devices. All field workers were trained to distinguish faeces between the different species and to evaluate the freshness of faeces. In order to increase the genotyping success rate, only fresh faeces (i.e. intact pellets with moist, shiny surfaces) were collected for analysis. To minimize cross-contamination, we sampled only faeces that were spatially segregated from other red deer faeces and thus were assumed to originate from one defecation event, and we avoided collecting samples from faeces with scattered pellets. One handful of pellets (approx. 5–15) of each fresh pellet group was collected using an inverted freezer bag, which was then reversed and closed. Samples were stored frozen (− 19 °C) in the sealed freezer bags until analysis.

In order to ensure a sufficiently large sample size, we collected all fresh faeces detected along the transects. As a first approximation of the desired sample size, we determined the number of samples for analysis as 2.5–3 times the ‘assumed’ number of red deer in each population (as recommended in Solberg et al. 2006). We based a rough estimate of the ‘assumed’ number of red deer based on the mean hunting bag data for each study area (for PF, see Lang et al. (2016) and Hohmann et al. (2018); for HU and SO, see Gräber et al. (2016) and Hohmann & Hettich (2018)). Therefore, we assumed that the average harvest rate per year is lower than 50% in our study areas (Clutton-Brock et al. 2002), and thus, the assumed population density in spring will be at least twice the harvest rate. We considered 2.5 to 3 times the harvest rate as a sufficient rough estimate of assumed number of deer. For each area, we aimed to analyse a number of fresh samples corresponding as a minimum to (mean of the harvest rate of the 3 years preceding the sampling × 2.5) × 2.5, e.g. for the HU area (1.9 × 2.5) × 2.5 = 11.87, i.e. a minimum of 1187 samples for the study area. As an additional measure of sampling intensity, we furthermore aimed at reaching a mean number of detections per individual of at least two (Miller et al. 2005). Compared to the assumed number of individuals, this measure does not rely on preliminary assumptions about population size. However, we were restricted in all three areas to a maximum number of samples due to budget reasons. From the total number of samples collected in each study area, we selected a subsample according to the preliminary sample size considerations (see above) for analysis. To achieve a representative spatial coverage of the subsample, we calculated the percentage of all samples found for each transect. We then drew the number of samples corresponding to this percentage randomly from the samples found on the respective transect. With this approach, we ensured that the spatial distribution of analysed samples reflects the spatial distribution of the detected samples.

In addition to faecal sampling, we collected tissue samples from all harvested red deer older than 1 year (and thus possibly present in the area during faecal sampling) in the hunting season after our faecal sampling in the PF study area. We obtained these samples in cooperation with the local forestry offices and hunters. Our aim was to check the coverage of the faecal sampling by comparing how many of the harvested individuals had been recorded before.

Genetic methods

We analysed faecal samples 2 to 4 months after collection. For isolation of DNA, we used a commercial kit (NucleoSpin Soil Kit, Macherey-Nagel, Baesweiler, Germany). To achieve a high proportion of target DNA, the standard protocol was modified in that two pellets of each sample were washed with 1.5 ml lysis buffer at room temperature in a 50 ml Falcon tube, thereby avoiding the destruction of the pellets and including mainly the mucosal layer on the pellet surface containing most of the intestinal cells. We isolated DNA from tissue samples of the harvested deer using the QiaAmp DNA Mini Kit (Qiagen, Hilden, Germany).

We used seven dinucleotide microsatellite markers for individual identification (Table 2). We initially tested 16 microsatellites that had already been used for amplification of red deer DNA and selected loci based on their polymorphism and their ability to produce consistent results with non-invasively obtained samples (data not shown; references for the used loci in Table 2). In tests with tissue samples, this set of loci ensured a sufficiently high discriminative power (i.e. probability of identity (PID) sib consistently well below 0.01; Waits et al. 2001). We furthermore used an x- and y-chromosome-specific region of the amelogenin gene as a sex marker according to Gurgul et al. (2010). The eight markers were combined and co-amplified in two separate multiplex PCRs (Table S1 in Supplemental Material). The thermocycling profile was as follows: 95 °C for 15 min, 45 cycles of 94 °C for 30 s, 57 °C for 90 s, 72 °C for 60 s and then 60 °C for 30 min. Amplification reactions were performed in triplicates in a total volume of 12 μl using the Qiagen Multiplex PCR Kit (Qiagen, Hilden, Germany). The primers were used at concentrations of 0.1 to 0.4 μM (for details of the PCR conditions, see Table S1). We separated fluorescently labelled DNA fragments on an ABI3730 DNA analyser and determined allele size using the ABI GS500LIZ size ladder (Applied Biosystems, Darmstadt, Germany). We included two negative PCR controls in every PCR set to detect potential contamination.

Table 2 Microsatellites used for individual identification of red deer faecal samples

We deduced consensus genotypes from the triplicate results. Samples were typed as heterozygous at one locus if both alleles appeared at least twice and as homozygous when all replicates showed the same result. We repeated the genotyping another three times when results were ambiguous after the first three replicates. All samples that failed to amplify or to produce unambiguous results for more than two loci were discarded. For samples with one or two missing loci, we carefully re-checked raw data for plausibility in case of matching genotypes and excluded any samples matching with more than one genotype (Wilberg & Dreher 2004). We also excluded samples showing signs of cross-contamination (i.e. genotypes with more than two alleles) from further analyses.

For population size estimation, not only the genotyping error rates are relevant but also how often whole genotypes are incorrectly identified in spite of repeated analyses and error checking. Thus, we additionally estimated a misidentification rate (i.e. false identification of a whole sample due to genotyping errors or potential misinterpretation of data in the lab) by submitting 98 red deer faecal samples to the lab in duplicate without disclosing which samples were duplicates (i.e. a blind test). Duplicate samples that did not match their corresponding original sample (i.e. did not exhibit the same genotype) were defined as misidentifications. In order to increase our blind test sample size with samples of known individuals, we furthermore used faeces from 13 captive red deer. These samples were collected immediately after defecation under direct observation, so that we could record the defecating individual and its sex and age class for each sample. Each of the samples was divided arbitrarily into 2 to 6 aliquots yielding a total of 40 aliquots, which were numbered randomly (such that the lab personnel had no information how many different individuals had been sampled and which aliquots originated from which individual) and then sent to the lab for analysis. Genotyping in this case should result in correct identification of the number of different individuals and reveal which aliquot originated from which individual, if no misidentifications occurred. We genotyped all blind test samples using the same procedure as for other faecal samples, i.e. three PCR repeats per sample.

For all analysed samples, DNA extraction and PCR setup were carried out in separate rooms on different floors of the laboratory to avoid transfer of amplified DNA into the pre-amplification steps (Waits & Paetkau 2005).

Data analysis

Determination of matching genotypes was carried out using GENECAP (Wilberg and Dreher 2004) based on the PIDsib method with a match probability of 0.05. In order to confirm the power of the loci, we calculated the probability of identity (PID) and, being more conservative, PID for siblings (Waits et al. 2001) as well as heterozygosity using GIMLET (Valière 2002). We calculated genotyping error rates (allelic dropout [ADO] and false alleles [FA]) as recommended by Broquet & Petit (2004; Eqs. 1 and 3).

Estimation of population size and density

We calculated density estimates based on spatially explicit capture-recapture (SECR) modelling (Efford et al. 2004; Borchers & Efford 2008; Royle & Young 2008; Royle et al. 2014). This approach uses spatial information (i.e. where an animal was detected during a capture/sampling session) to estimate animal home range centres and spatially referenced capture probability (Borchers & Efford 2008). By fitting a spatial model of the population and the detection process to the spatial detection histories, SECR allows estimating population density directly from the sampling data and unbiased by edge effects and incomplete detection. We ran spatially explicit models for total population density as well as for the two sexes separately using the R package secr (Efford 2017a). Because we covered all study areas with a rather dense transect grid representing an area search, we used polygon as a detector type (Efford 2011; Efford 2017b). We used a hazard halfnormal detection function and defined the buffer for each data set as four times its initial sigma. For each data set, we ran a model assuming homogeneous sampling (null model M0, g0 ~ 1) and a model accounting for sampling heterogeneity (model Mh, g0 ~ h2). The Mh model assumes two different groups in the population that differ in their detection probability and estimates the probability to belong to each of these groups (pmix). We compared the fit of the two models via the Akaike information criterion with an additional bias correction term (AICc, Burnham & Anderson 2002).

As an additional approach, we estimated population size using the R package CAPWIRE (Miller et al. 2005; Pennel et al. 2013). CAPWIRE is a maximum likelihood–based approach designed for genetic data, enabling population size estimation with data from single sampling sessions. This is possible because in contrast to other approaches, CAPWIRE uses counts of detections and it is based on urn models that allow multiple captures per session (Miller et al. 2005). CAPWIRE enables the estimation of population size in the presence of sampling heterogeneity with the two-innate rates model (TIRM) and even performs particularly well when detection rates are heterogeneous (Miller et al. 2005). For the three study areas, we estimated population size for males and females separately. Thereby, we sought to minimize total sampling heterogeneity, since sex-specific differences, e.g. in space or habitat use or behaviour, can be a significant source of heterogeneity bias, particularly in a sexually dimorphic species like red deer (Clutton-Brock & Lonergan 1994; Harris et al. 2010). However, there remain multiple sources for sex-independent individual heterogeneity. A likelihood ratio test implemented in CAPWIRE allows selection between the equal catchability model (ECM, comparable to M0 in our SECR estimates) and the TIRM (comparable to Mh in our SECR estimates). We additionally tested for presence of sampling heterogeneity by simulating the sampling process under the assumption of homogeneity and comparing the expected number of captures with the observed number of captures per individual using the R package CMRPopHet (Puechmaille & Petit 2007).

Since in addition to population size, estimates of population density are of crucial interest for managers and for purpose of comparison, we also estimated population density based on the CAPWIRE estimates. However, density estimation is not straightforward when study areas are not geographically closed. In our study areas, the assumption of geographic closure is not met (however, we consider the populations as demographically closed since we conducted our sampling well out of the farrowing and the hunting season and the genotyped fresh faeces were not older than approx. 2 weeks). For density estimation, we approached the effectively sampled area (ESA) by adding the mean maximum recapture distance (MMRD) as a buffer around each study area. We calculated MMRD for each sampled individual using the R package secr (Efford 2017a). In a study by Parmenter et al. (2003) comparing different approaches, the MMRD showed the best performance for estimating the ESA. We calculated sex-specific ESAs for each study area separately by buffering the forested parts of the study area with the respective MMRDs and clipping the buffers where, e.g. motorways or settlements represent human-made borders (see, e.g. Fig. 1). We used these sex-specific ESAs to calculate population densities from the CAPWIRE population size estimates.

Results

Field sampling and genotyping

Of all the collected samples (Table 1), we included 1218, 1700 and 1000 for DNA analysis from study area HU, SO and PF, respectively, obtaining genotyping success rates of 67.6% (HU), 72.6% (SO) and 70.1% (PF, Table 3). Altogether, 2762 (70%) of the 3918 analysed samples yielded consensus genotypes useable for further analyses. Of the 1156 samples that did not yield useable red deer consensus genotypes, 83 (7.1%) were excluded because they had been identified by their genotypes as originating from roe deer. For all 2762 samples with consensus genotypes, eight loci amplified for 2536 samples (92%), seven loci for 146 samples (5.3%) and six loci for 80 samples (2.9%).

Table 3 Genotyping results of red deer faecal samples from three study areas in Rhineland-Palatinate

For the successfully analysed red deer samples, PID over all loci was estimated to be 3.184 × 10−11, and PIDsibs was 3.339 × 10−04. Mean expected heterozygosity (H exp) was 0.81 and mean observed heterozygosity (H obs) was 0.77 (Table 2). The mean ADO rate estimated over all markers was 4%, whereas the FA rate averaged 1% (Table 2).

Over all three study areas, we detected 1431 red deer individuals, of which 568 (40%) were classified as male and 833 (58%) as female, while for 30 individuals (2%), no sex could be assigned because the sex marker failed to amplify. When comparing the sex ratios (male to female) of the sampled deer between the three study areas, there were marked differences: In the HU area, the sex ratio was nearly even (1:1.1), while the red deer sampled in the SO area exhibited a considerably female-biased sex ratio (1:1.8). The red deer sampled in the PF area lay between these two extremes (1:1.3). The mean number of detections varied between the three study areas: In PF, males were sampled on average 2.04 times and females were slightly stronger represented with an average of 2.28 times (Table 4). In SO and HU, the two sexes showed almost equal mean detection numbers: SO averages: males 2.08 and females 2.07 and HU averages: males 1.64 and females 1.63. However, when examining the absolute numbers of sampled individuals (Table 3, Figs. S1, S2, and S3 in Supplemental Material), in SO and HU, significantly more female than male individuals were detected.

Table 4 Red deer population sizes and densities in three forested study areas in Rhineland-Palatinate, south-western Germany, estimated by using CAPWIRE (TIRM = two-innate rates model) and spatially explicit capture-recapture (SECR) model Mh

The total MMRD differed between the three study areas and within the study areas between the two sexes (Table 3). In all three areas, male red deer exhibited larger MMRDs compared to females, resulting in larger ESAs. The difference between the sexes was largest in the PF and smallest in the SO (Table 3).

Of the 51 tissue samples that were genotyped from deer > 1 year old harvested in the PF area, 31 (61%) matched a genotype present in the faecal sample.

Blind test

In the first blind test part, consensus genotypes were obtained for both original sample and duplicate in 63 of the 98 sample pairs. Of these, 62 pairs were found to match, and one pair showed differences in two loci as well in the sex marker and was thus considered as a mismatch. In the second part based on samples from captive deer, 34 of 40 subsamples yielded usable consensus genotypes. All 34 subsamples were assigned correctly to the right individual; thus, in this part, no misidentifications occurred. We then combined the samples of the two parts. Thus altogether, the blind test consisted of 97 samples, of which one was not assigned correctly, representing a total misidentification rate of 1.03%.

Estimation of population size and density

Miller et al. (2005) recommended an average of 2–2.5 observations per individual (corresponding to samples/individual in our study) to achieve estimates within 10–15% of the true population size. In the HU study, the sample size was below this threshold (mean number of samples/ind. 1.63). However, in the SO and PF studies, the sample sizes were within the recommended range (mean number of samples/ind. 2.07 and 2.16, respectively; Table 4).

For the SECR density estimates, the Mh model clearly fit the data better for all three study areas and for both sexes, indicating the presence of sampling heterogeneity (Table 4; see Table S2 in Supplemental Material for the results of both model M0 and Mh as well as the AICc results).

We compared the performance of the models ECM and TIRM by using the CAPWIRE-inherent likelihood ratio test. The results show that the TIRM better fits the data for both sexes in all three areas (Supplemental Material Table S4). Therefore, the CAPWIRE TIRM estimates were used for all data sets (Table 4).

The estimated population sizes differ strongly between males and females for the SO and PF areas, but are quite similar for the two sexes in the HU area (Table 4). The sex ratios of the estimated populations are 1:1.0 for HU, 1:1.7 for SO and 1:1.5 for PF. These ratios are very similar to those obtained when comparing the genotyped individuals instead of the estimated population sizes (HU: 1:1.1, SO: 1:1.8, PF: 1:1.3); only in the PF area, the difference is slightly more pronounced, probably an effect of differences in detection probability between the sexes (as reflected in the average numbers of detections, Table 4). In all three areas, sampling heterogeneity was detected for both sexes in the test following Puechmaille and Petit (2007), corroborating the result of the SECR goodness-of-fit tests and CAPWIRE likelihood ratio tests.

The population densities based on the CAPWIRE and the SECR estimates yield similar results for the HU and PF study areas and both sexes (Fig. 3, Table 4). When adding the CAPWIRE estimates for the two sexes to a total population density, the estimates also correspond well to the SECR total population densities; however, in the SO area, the total red deer density estimated with SECR is lower compared to the CAPWIRE estimate. Furthermore, the sex-specific CAPWIRE estimates exceed the SECR estimates for the SO area (Fig. 3, Table 4).

Fig. 3
figure 3

Red deer density estimates in three study areas (HU Hunrsück, SO Soonwald, PF Pfälzerwald) based on two different approaches: TIRM denotes estimates based on the CAPWIRE two-innate rates model (for details, see text); SECR represents densities estimated using a spatially explicit capture-recapture approach. The hatched bars behind the TIRM estimates represent the sum of the male and female density

Discussion

Our findings suggest that non-invasive genetic estimation of population density based on faeces is a practicable and effective monitoring approach for red deer in forested areas. Other approaches applied in our three study areas based on harvest statistics, terrestrial spot light counts and aerial surveys reveal significantly lower estimates compared with the genetic approach (for PF, see Lang et al. (2016) and Hohmann et al. (2018); for HU and SO, see Gräber et al. (2016) and Hohmann & Hettich (2018)). Population estimates obtained with these conventional methods tend to underestimate populations and are thus often exceeded considerably by DNA-based estimates (Kendall et al. 2009; Harris et al. 2010).

However, to yield reliable results, non-invasive genetic estimates need to be based on a sufficient sample size and a representative sampling design. Concerning sample size, Solberg et al. (2006) and Puechmaille and Petit (2007) recommend analysing 2.5–3 times as many samples as the assumed population size or the minimum population size assessed, e.g. using direct counts. Since minimum population sizes or other comparative numbers are often not readily available, it may be more practicable to use the average number of observations per sampled individual as recommended by Miller et al. (2005). In their simulation study, Miller et al. (2005) showed that in populations of up to 100 animals, an average of 2.0 observations per individual generated CAPWIRE population estimates in a range of ± 15% from the true N. In our study, the average number of observations per individual ranged from 1.63 in the HU area to 2.28 for the female part of the PF population. For SO and PF, both sexes showed average numbers above 2.0 (Table 4), indicating that the populations were adequately represented in the sample to obtain reliable estimates. Without the budget restrictions in the HU area, we may have been able to increase the number of analysed samples to reach the recommended average number of observations per individual.

Concerning the sampling design, from our experience, we consider sampling along transects to be a practicable and representative solution in areas where the terrain allows following a randomly placed transect. In our first sampling in the HU area, we spaced the transects quite densely (approximately 120 m apart). With this design, we collected far more samples than we would have needed considering the recommendations of Solberg et al. (2006). Therefore, we used a less dense transect grid in the following study areas with transects spaced 200–250 m apart, having in mind that this distance still is small compared to red deer home ranges in our study areas (Hamann et al. 1997).

In our study, the overall genotyping error rate lies in the range of those reported in other studies (reviewed by Valière et al. (2006)). With PIDsibs being well below 0.01, we consider the set of seven loci as sufficient for discrimination between individuals (Waits et al. 2001). By conducting three to six PCR replicates per sample, we were able to keep the total misidentification rate below 5%, which can be considered as sufficient for the purpose of population estimation (Taberlet and Luikart 1999; Lukacs and Burnham 2005a). However, even for genotyping error rates in the range of 5%, population estimates can be considerably biased (Roon et al. 2005). In a simulation study, Valière et al. (2006) showed that with residual errors of 1–3%, population estimates with a small relative bias (below 10%) can be obtained. Due to the rigorous genotyping protocol, the misidentification rate (1.03%) in our study lies in this range, and thus, we expect the resulting bias to be minor.

Heterogeneity in capture/sampling probability is a problematic issue for population estimation in traditional and in non-invasive CR approaches and can result in negative bias of population size estimates (Otis et al. 1978; Boulanger et al. 2004; Ebert et al. 2010). Individual sampling heterogeneity is considered to be present in nearly all CR data sets (Pledger & Efford 1998). Different sources of heterogeneity can bias non-invasive genetic population density estimates, e.g. sex, age class, social status, individual differences in behaviour or space use (Boulanger et al. 2004; Bischof et al. 2020). Furthermore, differences in defecation rates can result in heterogeneous detection probabilities (Marucco et al. 2009; Lunt & Mhlanga 2011), an aspect that in our opinion should be addressed in future studies. In our study, we tried to minimize heterogeneity by treating the two sexes separately. However, all the implemented tests still detected considerable heterogeneity, indicating the presence of non-sex-related sources of bias in the data.

One potential source of sampling heterogeneity concerning the male part of the population in particular could be the fact that there is an age-related difference in male spatial behaviour: juvenile and subadult male red deer normally stay with the female groups and therefore share the same spatial behaviour (Clutton-Brock et al. 1982). In contrast, adult stags live solitary or in bachelor herds most of the year and therefore potentially show a different space use compared to the juvenile and subadult males. This aspect could introduce bias (Hickey & Sollmann 2018; Bischof et al. 2020) that would not be accounted for by sex-specific estimation of population density. Further research, e.g. based on telemetry data of red deer of both sexes and different age classes, is needed to clarify if such age-class-related behavioural differences could introduce bias in sex-specific population density estimates. However, we consider estimating population density for the two sexes separately to be an important step to reduce overall bias. Furthermore, novel estimation approaches for social species that can take into account group structure like those proposed by Hickey and Sollmann (2018) and Bischof et al. (2020) could help to mitigate bias introduced by social structure.

An approach to decrease individual- and sex-based heterogeneity for red deer population estimation and to increase the sample size could be to use not only one but multiple different sampling strategies and/or sample sources. This can result in an increase in overall detection probability and reduction of overall bias (Boulanger et al. 2008; DeBarba et al. 2010; Dreher et al. 2007). Dreher et al. (2007) sampled harvested bears in addition to hair sampling. Sampling of harvested individuals would be easily feasible in many deer populations. The fact that 61% of the harvested red deer sampled in our PF study area matched with genotypes found in the faeces sampling suggests that this approach is promising. Furthermore, it provides a rough evaluation of the coverage of our sampling, when comparing the proportion of known genotypes (i.e. represented before in the faecal sample) in the hunting bag to the proportion of known individuals in the estimated population: The number of individuals detected via faecal sampling in the PF area (327) corresponds to 67% of the estimated total PF population size (when adding the male and female CAPWIRE estimates: 487; see Table 4). This corresponds quite well to the proportion of matches in the harvest sample, corroborating the representation of the population in our faecal sampling.

Population closure is a major concern in DNA capture-recapture studies like in traditional CMR (Boulanger and McLellan 2001). Closure violation can result in positive bias of population estimates, when animals move in and out of the sampling grid. The alternative use of open-population CR models suited for estimation of population size is not applicable for our data, since open models cannot account for individual heterogeneity and their use requires larger sample sizes and longer study times compared to closed models (Pollock 1990; Boulanger and McLellan 2001; Luikart et al. 2010). However, population estimates can still be meaningful when closure is violated. In that case, they can be considered as superpopulation estimates, i.e. the sampled population extends beyond the sampling grid boundaries (Kendall 1999; Roon et al. 2005; Tsaparis et al. 2014). Closure violation however represents a considerable source of bias for the determination of population density. Nevertheless, population density, i.e. the relation to the area for which a given population estimate is valid, is needed for most management purposes (Wilson and Anderson 1985). ‘Naïve’ density estimates (i.e. when the area of sampling is used as ESA for density estimation without any correction or buffer) can be severely biased when the closure assumption is not met (Wilson and Anderson 1985; Harris et al. 2010; Dupont et al. 2019). Since our study areas were not geographically closed, we added the MMRD as a buffer to the study areas for density calculation as one approach to reduce bias. However, different approaches to define the ESA exist and the size of the buffer has a significant impact on the estimated density (Parmenter et al. 2003; Tioli et al. 2009). Furthermore, the suitability of a given approach may differ between species and study areas. Parmenter et al. (2003) stated that MMRD can be constrained by the size of the sampling grid and therefore suggested the use of telemetry data when available for calculating the ESA, which was not the case for our study areas. However, to obtain valid data for ESA calculation via telemetry, a large number of tracked animals would be necessary, making this difficult to implement in many studies.

Spatially explicit methods hold the potential to mitigate spatial sources of bias in density estimates and thus represent a superior alternative to methods in which density has to be inferred by approximating ESA (Obbard et al. 2010). Potential bias introduced due to disturbance by human field personnel however is not avoided by using SECR; thus, sampling with minimal disturbance remains important. In our study, we sampled each transect only once and tried to cover parts of the area en bloc to avoid repeated disturbance of the animals. Another solution to reduce disturbance might be to sample in larger time intervals, e.g. every 5 to 10 days instead of collecting samples on consecutive days (Brinkman et al. 2011; Dupont et al. 2019). Overall, we believe spatially explicit approaches should be the estimators of choice when the aim is estimation of population density. Nevertheless, the application of a second approach, as in our case CAPWIRE, in our opinion can be useful to gain more insight into the data and to allow comparison with a wider range of other studies.

In our study, the density estimates based on the CAPWIRE TIRM with the MMRD as a buffer correspond quite well with the SECR estimates (Fig. 3, Table 4). The fact that two statistically independent approaches yield similar results might indicate that the estimated red deer densities are not too far from reality. The densities estimated via SECR exhibit considerably larger confidence intervals in most cases; however, the SECR estimates already account for a certain spatial heterogeneity and thus can be considered as more robust compared to the CAPWIRE estimates. Furthermore, CAPWIRE has been originally designed and performs best with small populations (Miller et al. 2005); thus, we cannot completely rule out a reduced performance of the CAPWIRE estimator in our studies. However, several studies report good performance of the CAPWIRE estimator in larger populations, e.g. around 200 (Granjon et al. 2016) and 450 (Miller et al. 2005, taxicab populations). Our study indicates that for large populations of around 1000 or more individuals, CAPWIRE might exhibit reduced performance: We calculated total population sizes for the three areas using CAPWIRE, and even though the TIRM was recommended for all areas, the estimated TIRM population size was not included in the confidence interval for HU and SO (Table S7). This has not been the case when calculating population sizes for the two sexes separately in both areas and thus using smaller data sets.

Conclusion and management implications

The population densities in the HU and SO areas both exceed the densities estimated in the PF area considerably, whereas HU and SO exhibit rather similar densities, however with the SO area showing a considerably higher proportion of females in its population. For setting hunting quotas, usually the reproductive output for the next fawning season is taken into account. If hunting is supposed to reduce the population sustainably, the harvest rate must be above the reproductive output. Assessments of reproductive output are often based on total spring female population including females born the previous year (Bertouille & De Crombrugghe 2002; Bonenfant et al. 2003). Hence, based on our genetic data set, which excluded samples from newborns (in all three areas, we finished sampling at least a month before the beginning of the fawning season) for the SO population, a higher reproductive potential can be stated. This finding might be an explanation for the fact that – compared with the HU area – in the SO area, a considerably higher harvest rate has been kept up sustainably for years (see Fig. 2).

For a mammal with distinct morphological and ethological differences between sexes, non-invasive genetic approaches are able to deliver estimates of sex ratio in contrast to often highly biased estimates based on hunting bag analysis or visual observations (Hagen et al. 2018). As far as management of red deer is management of two sexes (Loe et al. 2009; Cywicka at al. 2019; Harris et al. 2010), reliable insights into the population structure are crucial. The results of our study – particularly the example of the SO versus the HU population, both showing similar densities but quite different sex ratios – furthermore underline the importance of considering the two sexes separately for density estimation.

The non-invasive genetic approach presented here holds the potential to obtain reliable data on red deer populations, which have until now been lacking in many regions (Smart et al. 2004; Apollonio et al. 2017). This lack of reliable information is hampering the assessment of hunting management effectiveness. In our study areas, hunting quotas are adjusted based on damage levels. Our non-invasive genetic density estimates allowed basing the degree of adjustment on sound sex-specific population data for the first time.

However, the question remains if a genetic approach is feasible on a larger scale, especially for species like red deer occurring in high densities and thus requiring the analysis of many samples. Including lab consumables and labour input as well as collection of samples in the field, our studies cost approximately 40–45 € per sample (with lab work and consumables representing approximately 75 to 80% of the total costs). In comparison to other non-invasive genetic population estimation studies, the costs for our study lie in the lower range (costs for other ungulate studies ranged between 60 and 100$ per sample; see, e.g. Harris et al. (2010), Poole et al. (2011) and Goode et al. (2014)). Compared to other approaches applied in our study areas, the non-invasive genetic method is rather expensive. Spotlight counts, e.g. cost approximately 1 €/ha; this corresponds to only 20% of the costs of a genetic estimate. The costs for aerial counts ranged from 1 to 4.5 €/ha depending on the coverage (Ehrhart et al. 2016). Further research could try to lower the expenses for genotyping, e.g. by optimizing the laboratory protocol or by improving the pre-selection of samples according to their freshness (Luikart et al. 2020). Besides this, the genetic method could be regarded as a tool to calibrate other approaches: It could be applied in larger time intervals (e.g. every 5 to 10 years) and in between, e.g. spotlight or aerial counts, which are much less expensive, but also provide less informative value compared to genetic estimates (Garel et al. 2010; Collier et al. 2013; Ehrhart et al. 2016), could be used. Thus, a reliable long-term monitoring could be based on such a combination of different methods.