Introduction

Accurate estimation of animal abundance is fundamental to a number of ecological problems and a necessary part of assessing population status for wildlife management or conservation (Williams et al. 2002; Conroy and Carroll 2009). One challenge, however, is that detection probability is usually <1 and must also be estimated to produce unbiased estimates of population parameters (Lebreton et al. 1992; MacKenzie et al. 2002; MacKenzie et al. 2003). A number of population estimation methods can accommodate imperfect detection; for example, N-mixture modelling (Royle 2004), distance sampling (Buckland et al. 2001), time-to-detection (Garrard et al. 2008), and capture–recapture (CR) (Otis et al. 1978; Seber 1982). In the case of CR, detection probability can be estimated via captures and recaptures of recognised individuals in traps or other encounter detectors (e.g. camera traps, hair snares), where the location of encounter occurs at fixed sites where the detectors have been positioned by researchers (i.e. spatial CR) (Efford 2004; Borchers and Efford 2008).

Increasingly, however, capture–recapture data have been collected via structured searches of a study area (i.e. search–encounter) (Royle et al. 2011; Royle et al. 2014) where researchers travel a search path and record encounters with individuals (e.g. photographically) and use unique pelage patterns or body markings to identify individuals and construct capture histories (e.g. Auger-Méthé et al. 2010; Morrison and Bolger 2014; Grange et al. 2015; Marshal 2017). In a sense, the animal remains relatively stationary while the detector (e.g. a human with a hand-held camera) moves past, instead of the detector remaining stationary while the animal moves past.

Search–encounter sampling has been described as following an area search of uniform intensity (i.e. uniform search) or a fixed search path (Royle et al. 2014). With uniform search, the study area is divided into polygons with well-defined boundaries. Within each polygon, the area is searched at a uniform intensity such that, given an individual is present inside the polygon, it has a constant probability of detection. Sampling produces a location within each polygon for each detected individual on each survey occasion. This approach and spatial CR (SCR) estimation methods were used by Royle and Young (2008) to estimate abundance of flat-tailed horned lizards (Phrynosoma mcallii). An alternative uses a fixed search path identified ahead of a survey and independently of the locations of the animals in the population. For example, search paths might follow roads or trails within a nature reserve or protected area. Data collected from such sampling might consist of (1) locations of encountered individuals, (2) locations of individuals and of observers while on the search path, or (3) the closest distance from the search path to the individual location, similar to distance sampling. Royle et al. (2011) demonstrated use of approach (1) for the willow tit (Parus montanus), and Gowan et al. (2021) applied approach (3) to estimate abundance, recruitment and persistence of North Atlantic right whales (Eublaena glacialis).

For data coming from approaches (1) or (2), modelling encounters with individuals along a search path is typically via estimation of the total hazard to encounter (Royle et al. 2011; Royle et al. 2014). Instead of using encounter relative to the fixed points of a detector array, encounter is modelled relative to the search path, which is represented by a series of line segments, each delineated by two point locations. The sum of the hazard for each point location along the path yields the total hazard to encounter for the search path. Whether using fixed detector locations or points along a search path, distance from the animal to observer or detector is treated as a covariate affecting detection. For detectors, distance is that between the detector and an unobserved activity centre for the individual, whereas the distance covariate for search path sampling is between the observer and the present animal location (Royle et al. 2011). Because of this difference, analysis of search path data explicitly incorporate the movement of animals around their activity centres (Royle et al. 2014).

Search–encounter sampling along nature reserve roads and trails, combined with photographic data collection, have the potential to generate useful encounter data for abundance or demographic parameter estimation in SCR analysis. In this study, our goal was to apply this approach to estimate abundance of the plains zebra (Equus quagga), a species well-suited to non-invasive CR because of its individually unique pelage patterns (e.g. Grange et al. 2015). We apply a hazard-function analysis to model encounters with individuals by fitting Bayesian hierarchical models, and we compare a number of models to represent the hazard function. We demonstrate that data from search–encounter sampling produce abundance estimates similar to counts from independent methods. Their application to analysing photographically collected encounter data, particularly opportunistically collected data, likely depends on the availability of auxiliary information to assess spatial extent and sampling intensity within a study area.

Methods

Study area

The study occurred at Telperion and Ezemvelo nature reserves, Gauteng and Mpumalanga provinces, South Africa (Fig. 1). Together, the reserves constituted c. 13,000 ha of protected predominantly grassland biome. The Wilge River separated an east section, containing Telperion, and a west section, containing Ezemvelo. Based on data from nearby Bronkhorstspruit weather station, the region experiences distinct wet and dry seasons, with c. 90% of rain falling during the months October to March (austral spring and summer). Average annual rainfall was 650 mm (range: 412 [1998]–949 [1989]), and range in average daily temperatures was \(4 \ ^\circ \text{C}\) in July to \(26 \ ^\circ \text{C}\) in January (Helm 2006).

The vegetation within Telperion and Ezemvelo is classified as Rand Highveld Grassland (Mucina et al. 2006) and Loskop Mountain Bushveld (Rutherford et al. 2006). Common grass species were Elionurus muticus, Eriagrostis curvula and Setaria sphacelata, and common woody species included Englerophytum magalismontanum, Vangueria infausta, Faurea saligna, Burkea africana, Combretum apiculatum, Cussonia paniculata, Strychnos pungens, Protea caffra, Acacia caffra and Gymnosporia spp. (Helm 2006). In addition to plains zebras, common large herbivores include blesbok (Damaliscus pygargis phillipsi), greater kudu (Tragelaphus strepsiceros), blue wildebeest (Connochaetes taurinus taurinus), black wildebeest (Connochaetes gnou), red hartebeest (Alcelaphus buselaphus caama), common eland (Taurotragus oryx), giraffe (Giraffa camelopardalis) and springbok (Antidorcas marsupialis). Leopards (Panthera pardus) occur in the reserve but are not common. Smaller carnivores include African civet (Civettictis civetta), black-backed jackal (Canis mesomelas) and caracal (Caracal caracal) (Helm 2006).

Fig. 1
figure 1

Telperion and Ezemvelo nature reserves, South Africa, showing search path

Data collection

We collected photographic data of plains zebras by driving a set route through both sections of the study area. We drove the entire route over 10 daily occasions in July–August 2017, recording our route with a global positioning system (GPS). We defined an encounter as a clear photograph of a zebra’s right flank (Fig. 2). We also recorded perpendicular distance between the vehicle and the animal with a range finder and the GPS coordinate of the observers.

Each photographed zebra was compared to a database of known individuals. If a photograph matched a recognised animal, we updated the encounter history with the new photograph; otherwise, we created an encounter history for a new individual. To ensure accuracy and consistency of identifications, all matching was conducted by one experienced observer (JPM). We also performed two additional checks after matching was complete: (1) within encounter histories for each individual to ensure that photographs were of the same individual, and (2) between encounter histories to ensure that an individual was not represented in more than one encounter history (Marshal 2017).

Fig. 2
figure 2

Two encounters with the same plains zebra individual showing the matching stripe pattern, 1 (A) and 3 (B) August 2017, Telperion and Ezemvelo nature reserves, South Africa

Model formulation

Following Royle et al. (2011), we formulated a Bayesian hierarchical model that represented the observation process and the ecological process. The observation level consisted of a hazard function to represent encounters with individual animals as a function of distance between the animal location (\(\textbf{u}_{ik}\)) and a search path (\(\textbf{X}\)) consisting of segments delineated by two-dimensional coordinates (\(\textbf{x}\)). Our choice of hazard model was to accommodate heterogeneity in detection caused by a non-linear search path, and the consequential variable exposure of animals along the path to sampling (Kéry and Royle 2021). We used the Gompertz formulation for the hazard model:

$$\begin{aligned} \log (h(\textbf{u}_{ik}, \textbf{x})) = \beta _0 + \beta _1 \times ||\textbf{u}_{ik} - \textbf{x}||, \end{aligned}$$

where \(h(\textbf{u}_{ik}, \textbf{x})\) is the hazard of encounter at location \(\textbf{x}\) for individual i occasion k, \(||\textbf{u}_{ik} - \textbf{x}||\) is the distance between individual i and the search path, and \(\beta _0\) and \(\beta _1\) are parameters to be estimated. We modelled probability of detection (\(p_{ik}\)) from the cumulative hazard (\(H_{ik}\)) and the data augmentation inclusion variable (\(z_i\); described below):

$$\begin{aligned} p_{ik} = z_i \times (1 - \exp (-H_{ik})). \end{aligned}$$

Observed encounters (\(y_{ik}\)) given the animal location were distributed as a Bernoulli random variable:

$$\begin{aligned} y_{ik}|\textbf{u}_{ik} \sim \textsf{Bernoulli}(\it p_{ik}). \end{aligned}$$

We defined the ecological process model assuming the individual activity centres (\(\textbf{s}_i\)) are distributed over a two-dimensional state space (S) and followed a uniform distribution:

$$\begin{aligned} \textbf{s}_i \sim \textsf{Uniform}(\it S), \end{aligned}$$

where S was defined by the perimeter of the study area (Royle and Young 2008). Because some of the individual locations (\(\textbf{u}_{ik}\)) were unobserved, we used a bivariate normal distribution for \(\textbf{u}_{ik}\):

$$\begin{aligned} \textbf{u}_{ik}|\textbf{s}_i \sim \textsf{Normal}(\textbf{s}_i, \mathbf{\sigma }^2_{move} \textbf{I}), \end{aligned}$$

where \(\mathbf{\sigma }^2_{move}\) represented variability in movement around the activity centre and \(\textbf{I}\) was a \(2 \times 2\) identity matrix.

We used data augmentation to estimate population size (N) within S (Royle and Young 2008; Royle et al. 2009; Gardner et al. 2009). We defined M as the number of individuals encountered during the study (n) plus the augmented individuals with all-zero capture histories (\(M-n\)), some of which represented animals in the population available to be encountered but were not. We defined a binary latent variable (\(z_i\)) to identify which individuals in M were in the population N. The \(z_i\) were assumed to follow a Bernoulli distribution with probability \(\psi\). We estimated N as the sum of the latent variables: \(N = \sum _i^M z_i\).

Sensitivity analysis

We assessed the effects of different models for the hazard function in the detection process, with the goal of choosing a model that adequately fit the observation data. Specifically, we considered the normal kernel and Weibull functions, and a function based on the squared distance (model details in Royle et al. (2011)). We divided the search path into segments with length 1.5 km, based on preliminary analysis that \(\mathbf{\sigma }^2_{move} \approx 0.8\) km; 1.5-km spacing kept segment length \(<2\mathbf{\sigma }^2_{move}\) (Sun et al. 2014). We evaluated the sensitivity of the abundance estimate to segment length by running the analysis with 1.25- and 1-km segments.

Model implementation

We conducted analyses in JAGS (Plummer 2003), running through R (R Core Team 2023) by using the library jagsUI (Kellner 2021). We used noninformative uniform priors for parameters \(\beta _0, \beta _1, \log (\sigma )\) and \(\psi\), and augmented encounter histories to a total of \(M = 1500\). We ran three independent chains of sufficient length to allow for convergence (10,000–20,000 iterations) following a burn-in period (5000 iterations). We thinned one value for every 10 iterations to reduce the effects of autocorrelation on posterior estimates. We assessed convergence with the Brooks-Gelman-Rubin diagnostic (\(\hat{R} < 1.1\)) (Gelman and Shirley 2011).

We used Bayesian P-values to assess goodness of fit of the detection models (Gelman and Shirley 2011), based on a variation suggested by Royle et al. (2014) that assessed discrepancies between expected and observed number of encounters across individuals. An adequate-fitting model was indicated by Bayesian P-values within the range 0.1\(-\)0.9 (Royle et al. 2014). To the assess spatial randomness assumption for the ecological process model, we calculated an index-of-dispersion test based on the ratio of the variance to mean (Illian et al. 2008) in the number of activity centres across a defined number of grid cells covering the study area. We also calculated a Bayesian P-value, based on a Freeman-Tukey statistic, for the discrepancy between counts from the posterior sample of activity centres in grid cells and simulated counts under spatial randomness (Royle et al. 2014). We calculated both statistics with function SCRgof in library scrbook (Royle et al. 2014). We report estimated posterior means and 95% credible intervals for model parameters. R and JAGS scripts to fit the models are in the Supplementary Material.

Results

We analysed a total of 821 encounters from 383 recognised individuals. Number of occasions on which individuals were encountered ranged from 11 for 1 individual to 2 for 99 individuals. One-hundred eighty-one zebras were encountered on 1 occasion. Our assessment of survey line point coverage revealed little difference between estimates using 1-, 1.25-, and 1.5-km segment lengths (Table S1). Presented here are estimates for the 1.5-km analysis (Table 1).

Table 1 Posterior summary statistics of movement and abundance parameter estimates for plains zebra, based on four hazard function detection models and 1.5-km search path segments; Telperion and Ezemvelo nature reserves, South Africa, July–August 2017

Based on three detection models, the posterior mean of zebra abundance was just over 1000 individuals (c. 960, 1220), and did not vary much between the models. Interestingly, the Weibull detection model produced a substantially lower estimate than the other three models: 811 (719, 917).

For all detection models, \(\sigma _{move}\) was c. 0.8 km, and again it did not vary much among detection models (Table 1). The Weibull model produced a slightly higher estimate of 0.88. Bayesian P-values for all models indicated adequate fit to observed detection frequencies (Table 1), with values ranging between 0.21 (Weibull) and 0.49 (Gompertz). However, the ecological process model appeared to deviate from the spatial randomness assumption (index-of-dispersion and Freeman–Tukey: \(P < 0.001\) for all models), with local density appearing to be relatively high in the south-central and north-east portions of the study area (Fig. 3).

Fig. 3
figure 3

Posterior distribution of activity centres (circles) for observed plains zebras, July–August 2017, Telperion and Ezemvelo nature reserves, South Africa. Darker map pixels indicate higher density, and the dark line is the search path

Discussion

Our application of photographic and search–encounter sampling for plains zebras at Telperion and Ezemvelo showed that three of the four detection models estimated abundance in the range of 1080–1088 animals, numbers that were similar to counts generated from aerial surveys conducted earlier the same year (1374, February 2017) and early the following year (1106, January 2018; Oppenheimer Generations Research and Conservation, unpublished data). Because the aerial counts occurred around the annual peak in zebra births, the lower SCR estimates could reflect in part higher mortality and reduction in first-year animal numbers by the food-scarce dry season. The Weibull model produced a somewhat lower SCR estimate, an outcome similar to that of Royle et al. (2011). All four detection models showed evidence of adequate fit to our encounter data; however, the assumption of spatial randomness was not supported for the ecological process model.

Ideally, search paths are located randomly with respect to animal locations (Conroy and Carroll 2009), something that often is not possible in protected areas where vehicles are required to remain on designated routes. We argue, however, that potential for bias in our study was minimal. A histogram of perpendicular distances indicated no evidence that zebras avoided roads (Fig. S1), and the open grassland landscape and relatively flat terrain of the study area suggested little benefit to using roads as travel routes to the extent that it would distort encounter rates.

Spatial randomness

Because animals are selective about where they occur on the landscape, higher densities of individuals in favourable parts of the environment should be expected. This selectivity could generate the appearance of clustering even with independent activity centres (Royle et al. 2014). Alternatively, animals expressing territoriality or avoidance of conspecifics could exhibit activity centres that are more evenly distributed on the landscape than expected from spatial randomness, in which case locations of activity centres are not likely independent between individuals (Reich and Gardner 2014).

However, the uniform statistical distributions describing the locations of activity centres are able to accommodate a variety of spatial patterns, including a range of clustering and spacing patterns (Royle et al. 2014). Moreover, the uniform priors have minimal effects on locations of activity centres if models are fitted to a large enough data set (Royle et al. 2014). Moreover, data simulation by Kéry and Royle (2021) suggests a minimal influence on estimation if spatial variation in density is ignored. Estimates based on uniform distribution of activity centres, habitat categories and density surface modelling produce estimates with considerable overlap of posterior distributions, differing mainly in the variation explained in the data. They did not do an exhaustive simulation study, but the analysis is supportive of minimal bias. Thus we expect the consequence of varying density in the study area to have a minimal influence on estimation of overall abundance, affecting the locations of the activity centres rather than their number.

Although not our objective to investigate drivers of local variation in zebra density across the study area, relationships between landscape attributes and density might be a question of ecological interest and could be accommodated in SCR modelling (Royle et al. 2014). Including such relationships in the ecological process model might produce an intensity surface for the point process model that accounts for the inhomogeneity in the underlying intensity or density function. An alternative to modelling intensity as a function of environmental covariates might be to use cluster process models having an intensity function with multiple clusters, each cluster representing a high-intensity region of state space (Illian et al. 2008).

A potential consequence of unmodelled spatial heterogeneity in activity centres is an effect of proximity to survey path on individual detection probability. If preferential space use produces heterogeneity in encounters because of differing probabilities of detection between individuals, such heterogeneity could bias estimates of abundance (Royle et al. 2013) and should be apparent though poor fit between a detection model and the observed numbers of encounters across individuals. Our detection models did not account for individual heterogeneity and yet there was no evidence of poor fit to the encounter data, suggesting a minimal influence on bias.

Zebra herding and cohesion

Plains zebras occur in herds, and so independent encounters of animals might be violated because of zebras occurring in social groups (aggregation) or having non-independent encounters (cohesion) (Bischof et al. 2020). Estimates of abundance from SCR are, however, robust to low–moderate spatial dependence: there is low bias with social groups, and low–moderate amounts of cohesion or dispersion have minimal effects on bias or precision; however, overdispersion strongly affects coverage of confidence intervals around parameter estimates (Bischof et al. 2020).

Zebras generally occur in groups of approximately a half-dozen animals, and rarely more than a dozen (Estes 1991). This degree of social grouping suggests a minimal effect on bias, but a probable effect on coverage. Modelling groups explicitly (e.g. Hickey and Sollmann 2018; Emmet et al. 2022) is probably not currently feasible for this population; additional data might assist in identifying zebra herds and assigning individuals to herds. However, we encountered many individuals once, and a single observation would not be sufficient to distinguish herd mates from animals in separate herds but encountered coincidentally. In the examples of both Hickey and Sollmann (2018) and Emmet et al. (2022), study animals had been the subject of long-term monitoring, and so social structure and group membership had been well established.

Search–encounter and opportunistic data

Citizen science schemes and online wildlife databases (e.g. iNaturalist, Macaulay Library) are a growing source of photographic data to which search–encounter SCR models might be applied, particularly with species that have individually unique identifying features. Because of photo-recorded animal encounters, geographic coordinates for each encounter via GPS-enabled devices, and machine learning to ease the effort of matching individual animals based on pelage patterns, there is the possibility of using opportunistically collected data to develop spatially explicit encounter histories of wildlife species. With sufficiently photographed species in highly visited areas, such data could yield estimates of abundance or other demographic parameters in situations where resources to conduct structured surveys are inadequate. Data generation via such schemes, however, amounts to unstructured spatial sampling, where locations, intensity and protocols of sampling occur opportunistically, rather than according to a pre-defined systematic approach (Royle et al. 2014), which could lead to spatially biased sampling.

The problem of spatial sampling bias is well-recognised in species distribution research (Kéry et al. 2010; Botts et al. 2011; Hugo and Altwegg 2017; Binley and Bennett 2023), and it has been addressed for some atlas databases by explicitly modelling separate observation and ecological processes (Kéry et al. 2010; van Strien et al. 2013; Péron and Altwegg 2015). Others integrate multiple data sets, where systematic presence-absence data (PA) are combined with opportunistic presence-only (PO) data in a model representing an ecological process for abundance or density and conditional observation processes for each of the PA and PO datasets (Sun et al. 2019). Despite the problems, however, potential data for such analyses are widely available if they can be analysed appropriately. Recognising that photographic encounters are the product of an observation process and developing models that describe that process will contribute substantially to producing more reliable estimates of abundance or demographic parameters from photographic encounter data.