We focus on calls in this study, although the methods apply to other sounds produced by the animals as well. Marques et al. (2013) described four steps for estimating call densities from PAM data:
1. Identify calls produced by animals of the target population that relate to animal density, i.e., calls that are produced by a known proportion of the population (e.g., adult males) with some regularity following a mean call production rate (given, e.g., as the number of calls produced by an individual per day).
2. Collect a sample of \(n\) detections of calls using a well-designed survey protocol (e.g., the calls detected in the acoustic recordings in Fig. 1).
3. Estimate the false positive rate \(f,\) i.e., the rate of incorrectly classifying a detected sound as the call of the target species.
4. Estimate the average probability of detecting a call \(p\) within the search area.
More than one method is available for each step. While each of these four steps is necessary to estimate call density and relies on the previous steps, this paper focuses on a comparison of different methods for step 4, i.e., estimating the probability of detecting a call. In order to estimate animal density from PAM data, we would need to convert call densities into animal densities in a fifth step which requires obtaining a conversion factor (e.g., the mean call production rate per individual, Marques et al. 2013).
To familiarize the reader with the four steps of PAM call density estimation and the bowhead whale dataset used in this study, we present a simple hypothetical example in Fig. 1. We are interested in estimating call density of bowhead whales during their fall migration from Canada into the Bering and Chukchi Seas, hence we use all calls produced by bowhead whales (Mathias et al. 2008) (step 1). For step 2, we moor seven sensors at our study site in shallow waters (approximately 50 m), each capable of measuring the azimuth to the sounds they record. For reasons explained below related to localizing calls, the spacing of the sensors should be chosen that calls produced near one sensor (e.g., sensor A in Fig. 1) have a high probability of being detected at neighboring sensors (B and C). While the sensors are recording, bowhead whales migrate through the area and make calls (e.g., whale W1 produces call C1) or not (e.g., silent whale W3). Some calls are not detected (e.g., C1), while others are detected by one sensor (e.g., C4) or multiple sensors (e.g., C2). Other sounds might also be detected by the sensors and falsely classified as whale calls (e.g., C3).
As part of step 2, the recordings are analyzed for acoustic detections using either (i) a manual search protocol where human observers scan the recordings for calls, e.g., visually screening spectrograms of the acoustic data, or (ii) an automated, computer-based, detection and classification algorithm. The latter generally requires that a false positive rate be estimated (see step 3 above), typically by comparing the automatic detections with detections acquired by a human observer (as in (i)). In the case of large datasets, not all automated detections need to be verified to estimate a false positive rate (Marques et al. 2009). A systematic-random sample can be taken instead, where even spacing occurs between samples and a random starting detection is selected, to ensure both a representative and random sample, e.g., every 100th detection starting at the 32nd detection. After automated or manual detection, calls are matched across sensors, leading to a capture history similar to that illustrated in Table 1.
Table 1 Capture history of detections at sensors A–G based on the example from Fig. 1 Calls detected on multiple sensors are localized using the call’s azimuths from the sensors (Fig. 1). Pomerleau et al. (2011) showed that the mean dive depth of bowhead whales does not exceed 100 m. As the difference between this and the sensor depth is much smaller than the distance that bowhead whales can be detected from (e.g., Thode et al. 2020), we ignore depth and use horizontal space in the following (Barlow and Taylor 2005). For these localized calls, the distance to the detecting sensors can be determined (Table 2). This process naturally results in that only those calls that are easier to detect at greater distances can be localized. Consider, e.g., a call produced 4 km south of sensor A. Even though this call may be close enough to sensor A to be detected with high probability, in order for it to be localized, it has to be detected by at least one more sensor, e.g., B or C at 9.6 km or 11 km distance to the call, respectively.
Table 2 Distances (in km) between localized calls and sensors A–G, following the example from Fig. 1 and the capture history in Table 1 In this hypothetical example, the data used for SECR analyses would be the capture histories from Table 1, while the data used for DS analyses would be the distances from Table 2. Although for PS we do not use distances for model fitting, we use these distances to limit the analyses to counts of calls within a defined search radius. Calls only detected by one sensor cannot be localized; they therefore lack distance estimates and are not included in the PS or DS analyses. We refer to these single-detector calls as “singletons” in the following. SECR is the only method that includes singletons in the analysis.
Analyses methods and assumptions
Here we summarize the formulas and assumptions for the three density estimation methods in the context of PAM. More complete descriptions of these methods in the context of PAM can be found in Marques et al. (2013) and, in general, for PS in Borchers et al. (2002), for DS in Buckland et al. (2015) and for SECR in Borchers and Efford (2008) and Borchers (2012). Using the notation from the four steps above, i.e., the \(n\), \(f\) and p, the estimator for call density Dc in its most basic form is (Marques et al. 2013):
$${\widehat{D}}_{c}=\frac{n (1-\widehat{f})}{A \widehat{p} T} ,$$
(1)
where A is the total search area covered by \(K\) sensors and T is the duration of the recording. While \(K\) and \(T\) are the same for PS, DS and SECR, each method defines Dc, \(n\), \(f\), A and p differently. Hence, we use subscript notation for these quantities in the following.
The average detection probability p within the search area is generally modelled using two main components: the absolute detection probability at zero distance from the sensor \({g}_{0}\), which is the probability that a call made at zero horizontal distance from the sensor is detected by the sensor, and a detection function \(g(y)\) that describes the decay in detection probabilities with increasing distance y from the sensor relative to \({g}_{0}\). Depending on the method, either component is assumed or estimated from the data where applicable (see below). A frequently used detection function is the half-normal:
$$g\left(y\right)=\mathrm{exp}\left(-\frac{{y}^{2}}{2{\sigma }^{2}}\right), \sigma > 0.$$
(2)
Equation (2) contains one parameter, the scale parameter σ, which needs to be estimated. Note that \(g(y=0)\) = 1. Larger σ values yield detection functions with high detection probabilities out to larger distances. In the following we use these components, \({g}_{0}\) and \(g(y)\), to compare \(p\) for the three different methods.
Plot Sampling (PS) for PAM data
PS is the simplest of the three estimation methods, but places the most demands on the PAM localization capability. PS limits the search area \({A}_{PS}\) to the K circles with radius \({w}_{PS}\) around the sensors, each circle with area \({a}_{PS}\), and includes only the calls localized within \({a}_{PS}\). Here, \({n}_{PS}\) is the sum of the number of detections within radius \({w}_{PS}\) around each sensor, counting any duplicates of a given call caused by overlapping circles twice. The total search area \({A}_{PS}\) equals \(K{a}_{PS}\).
PS assumes that all calls produced within the individual \({a}_{PS}\) are detected with certainty. To meet this assumption, the search area is typically limited to a relatively small radius \({w}_{PS}\). We can therefore assume that \({{g}_{0}}_{PS}=1\) and \({p}_{PS}=1\). Hence, this method does not require estimating a detection function, at the cost of rejecting large numbers of detections that originate outside \({w}_{PS}\). As we need to determine which calls originated from within \({w}_{PS}\), a successful PS application requires that all calls produced within \({w}_{PS}\) around any sensor are localized—hence, the required sensor spacing described above. Further assumptions are listed in Table 3.
Table 3 Summary of assumptions and their importance for PS, DS and SECR in the PAM context (based on Buckland et al. 2001; Buckland 2006; Borchers and Efford 2008; Marques et al. 2013; Thomas and Marques 2012) The false positive rate \({f}_{PS}\) for calls within \({w}_{PS}\) is defined as the proportion of all sounds localized within \({w}_{PS}\) around the sensors that were falsely identified as calls of interest. It can be estimated as described above in Sect. 2, limiting the representative sample to calls localized within \({w}_{PS}\).
Distance sampling (DS) for PAM data
Here, each sensor represents a point in a point transect survey, which is a form of DS (e.g., Buckland et al. 2001, chapter 5; Buckland 2006). In comparison to PS, we expand the search radius from \({w}_{PS}\) to a larger radius, \({w}_{DS}\) and include all \({n}_{DS}\) call detections within \({a}_{DS}\) (the circular area around a sensor with radius \({w}_{DS}\)) from each of the K sensors. Like PS, DS assumes that all calls at (or near) the sensor are detected with certainty, i.e.: \({{g}_{0}}_{DS}=1.\) However, we no longer assume that all calls within the area \({a}_{DS}\) around each sensor are detected with certainty. Instead, we fit a detection function \({g}_{DS}\left(y\right)\) to the distances between the sensors and the detected calls (e.g., as in Table 2) and use it to estimate the average detection probability within \({a}_{DS}\):
$${p}_{DS}=\frac{2}{{{w}_{DS}}^{2}}{\int }_{0}^{{w}_{DS}}{yg}_{DS}\left(y\right)dy .$$
(3)
An estimate of \({p}_{DS}\) can be obtained using Eq. (3), replacing \({g}_{DS}\left(y\right)\) with \({\widehat{g}}_{DS}\left(y\right)\) (Buckland et al. 2015). One sees that PS is a limiting case of DS when the search radius \({w}_{DS}\) is shrunken to values small enough that \({p}_{DS}\) becomes 1. Similar to PS, \({n}_{DS}\) refers to the sum of the number of detections that fall within the search areas \({a}_{DS}\) of the \(K\) sensors, and any call that falls within overlapping search areas is counted towards \({n}_{DS}\) for each time it was detected by a sensor along with the distance to the respective sensor. While this may seem to artificially inflate \({n}_{DS}\), the reasoning again arises from the requirement that the total search area \({A}_{DS}\) is \(K{a}_{DS}\), i.e., no subtraction of any overlapping areas occurs.
Multiplication of the search area \({a}_{DS}\) around a sensor with \({p}_{DS}\) yields a quantity called the effective area \({\nu }_{DS}={a}_{DS}{p}_{DS}\), which is the area around a sensor within which as many calls were missed as were detected outside. It can also be expressed in terms of the detection function (Buckland et al. 2015):
$${\nu }_{DS}= 2\pi {\int }_{0}^{{w}_{DS}}y{g}_{DS}\left(y\right)dy.$$
(4)
An estimate of the effective area, \({\widehat{\nu }}_{DS}\) can be obtained using Eq. (4), replacing \({g}_{DS}\left(y\right)\) with \({\widehat{g}}_{DS}\left(y\right)\). We can substitute \(K{\widehat{\nu }}_{DS}\) for \(A\widehat{p}\) in Eq. (1) for estimating call densities.
Another critical assumption for DS is that distances between the sensor and the calls are measured accurately, just like for PS. Uncertainty in localizations and, hence, in the distances, leads to bias in \({\widehat{p}}_{DS}\) and the estimated call densities (e.g., Borchers et al. 2010). The influence of minor random distance errors can be alleviated by fitting the detection function to binned distances, where the bin width is set to equal the distance error (Buckland et al. 2015). As only localized calls are included in fitting the detection function (as opposed to any detected call), the detection function describes the probability of localizing a call with increasing distance from the sensor (as opposed to the probability of detecting a call). It follows that the detection function \({g}_{DS}\left(y\right)\) in the PAM context considered here is a “localization function” rather than a detection function. Generally, we expect \({g}_{DS}\left(y\right)\) to decrease with increasing distance from the sensor and, although singletons are not localized, an increasing proportion of singletons with increasing distance from the sensor. Further assumptions are listed in Table 3.
The false positive rate \({f}_{DS}\) for calls within \({w}_{DS}\) is estimated as the proportion of all sounds localized within \({w}_{DS}\) around the sensors that were falsely identified as calls of interest. It can be estimated as described above in Sect. 2, limiting the representative sample to calls localized within \({w}_{DS}\).
Spatially explicit capture-recapture (SECR) for PAM data
For SECR we estimate the probability \({{g}_{0}}_{SECR}\) of detecting a call at distance zero as well as the detection function \({g}_{SECR}(y)\) from the capture histories of the calls (e.g., Borchers and Efford 2008; Borchers 2012). Hence, in comparison to PS or DS, we are not required to assume that all calls at/near the sensor are detected and we do not require call distances or locations. Furthermore, the data are not truncated by a search radius, i.e., \({w}_{SECR}=Inf\). All detected calls, along with their detection histories, are included in the analysis, regardless of the number of sensors they were detected on. Theoretically, with \({w}_{SECR}=Inf\), the total search area \({A}_{SECR}=Inf\) and \({p}_{SECR}\) approaches zero. Hence, in practice, we use a different approach where the search area \({a}_{SECR}\) around each sensor only extends out to a defined distance \({w}_{SECR}\) beyond which it is safe to assume that no calls can be detected (Efford 2019). Nonetheless, we do not estimate an average detection probability within \({a}_{SECR}\). Instead, we use estimates of \({{g}_{0}}_{SECR}\) and \({g}_{SECR}(y)\) to obtain an estimate of the effective area \({\nu }_{SECR}\). As for DS, it is estimated using a combination of the search area and the detection probabilities. However, in contrast to DS, the effective area \({\nu }_{SECR}\) is defined as the whole area surrounding the K sensors within which as many calls were missed as were detected beyond. It is estimated using the following steps (e.g., Stevenson et al. 2015). First we estimate the probability \({p}_{k}\left(X\right)\) that a call produced at location \(X\) (this location is unobserved) is detected by the kth sensor using:
$${p}_{k}\left(X\right)={{g}_{0}}_{SECR}{g}_{SECR}\left({y}_{X}\right),$$
(5)
where \({y}_{X}\) is the distance between \(X\) and the kth sensor. The probability \(p.\left(X\right)\) that the call was detected on at least one sensor becomes:
$$p.\left(X\right)=1-{\prod }_{k=1}^{K}\left[1-{p}_{k}\left(X\right)\right],$$
(6)
The effective area is obtained by integrating \(p.\left(X\right)\) over \({A}_{SECR}\). In practice this is done by dividing \({A}_{SECR}\) into \(I\) grid cells, each with size \({a}_{i}\), where the \({X}_{i}\) represent the center points of the grid cells:
$${\nu }_{SECR}={\sum }_{i=1}^{I}p.\left({X}_{i}\right){a}_{i}.$$
(7)
The estimate \({\widehat{\nu }}_{SECR}\) obtained using Eq. (7) replaces \(A\widehat{p}\) from Eq. (1) for estimating call density with SECR. Also in contrast to PS or DS, \({n}_{SECR}\) refers to the total number of unique calls included in the analyses for SECR and each call contributes to \({n}_{SECR}\) only once, regardless of how many sensors detected it (as opposed to \({n}_{PS}\) or \({n}_{DS}\) which refer to the number of detections for PS and DS, respectively).
This method assumes that calls are matched reliably across sensors, detections are made independently between sensors, no un-modelled heterogeneity in detection probabilities exists (i.e., the call detection function depends only on the distance to the sensor, or other appropriate covariates are included in the detection function model, e.g., Singh et al. 2014). Further assumptions are listed in Table 3. The assumption of independent detections between sensors, which emerges as a key factor in this study, means that the detection of a call on one sensor does not influence the probability of detecting a call on another sensor.
The false positive rate \({f}_{SECR}\) is estimated as the proportion of all calls detected on any sensor that were falsely identified as calls. In general, we expect the false positive rate to be higher for SECR compared to PS and DS, because the SECR analysis incorporates all call detections including singletons, and not just localized calls. In comparison, for PS and DS both the truncation and the inclusion of localized calls only, potentially eliminate a lot of false detections from the analysis.