Abundance-based detectability in a spatially-explicit metapopulation: a case study on a vulnerable beetle species in hollow trees

In many fragmented habitats, the detectability of a population in a habitat patch closely depends on the local abundance of individuals. However, metapopulation studies rarely connect abundance and detectability. We propose a framework for using abundance-based estimates of detectability in the analysis of a spatially-explicit stochastic patch occupancy model (SPOM). We illustrate our approach with the example of Tenebrio opacus, a beetle inhabiting hollows in old trees, and have based it on a 6-year monitoring programme of adult beetles in an area harbouring a high density of old oaks. We validated our abundance-based methodology by showing that the estimates of detectability were positively and significantly correlated with those obtained from presence/absence data (Pearson r = 0.54, p < 2E−16) in our study system. We further showed that the height of the hollow on the tree and the area of its entrance hole, the living status and girth of the host tree, and the time of survey significantly affected the detectability of beetle populations. Median detectability was 51% for one survey. The SPOM analysis revealed a high but heterogeneous extinction risk among trees, suggesting a metapopulation dynamics between the “classic” and “mainland–island” paradigms. However, it also indicated unexplained beetle colonization of trees in our study, despite the fact that we included limited detectability in our estimation procedure. This may have been due to the cryptic larval stage of T. opacus and may thus invalidate the use of a classic SPOM in our study system. Electronic supplementary material The online version of this article (10.1007/s00442-018-4220-5) contains supplementary material, which is available to authorized users.


Rationale of the geometric sample model
During a survey event, observation on a tree went on as long as new individuals were detected, without a priori absolute time limit for sampling. We modeled this as follows : each individual is detected at a rate ρ, and when no individual is detected during some time span t max , sampling stops.
If one calls n tot the true number of individuals in the tree when the observation takes place, then the time before detecting the first individual T 1 is exponentially distributed with rate r 1 = n tot ρ. The time between detecting the first and the second individual is exponentially distributed with rate r 2 = (n tot − 1)ρ. More generally, the time between detecting the k th and the k + 1 th individual is exponentially distributed with rate r k = (n tot − k + 1)ρ.
Consequently, no individual is detected if T 1 > t max . One individual is detected if T 1 < t max and T 2 > t max . More generally, k individuals are detected if T 1 , ..., T k < t max and T k+1 > t max . We now derive the probability P k that k individuals are detected, that is the probability that T 1 , ..., T k < t max and T k+1 > t max . Because the times T k are independent one from another: P k = e −t max r k+1 Π k l=1 (1 − e −t max r l ) = e −t max ρ(n tot −k) Π k l=1 (1 − e −t max ρ(n tot −l+1) ) This observation model is rather uneasy to adjust to data. However, one may consider the reasonable limit case where the individual detection rate ρ is low (ρ → 0), the true number of individuals in the metapopulation is large (n tot → +∞) and the product of both quantities n tot ρ, which is the rate of detection if the first individual in the tree r 1 , remains finite. Then P k boils down to the classical geometric distribution with parameter g = e −t max r 1 : We modeled the relationship between parameter g, temporal features of survey and tree features in a rather phenomenological way, using a logit link function and a linear combination of dependent variables (equation 1 of main text). Detectability at visit scale, i.e. the probability of detecting at least one individual, thus verifies φ = 1 − g, which explains equation (2) of main text.

Building the response variable: positive count data
The number of oak trees in the Bjarka Saby area was 338. The number of surveys in the study area was 4151. Most of the surveys did not lead to any detection of individuals. The number of surveys with no individual observed was 3636.
The calibration of the geometric observation model (equation 1 in main text) relies on the surveys that yielded strictly positive numbers of observed individuals. The number of surveys with detection of at least one individuals is 515, over 145 trees. Figure S2.1 shows the distribution of the number of observed individuals among surveys which yielded detection of individuals. Building covariates to explain positive count data

Temporal covariates
We used three temporal covariates to explain the variation in the number of observed individuals presented above: date, time and temperature. We call them "temporal" because they can vary between two surveys performed on the same tree. The date is expressed in number of days since the 1st of January. Note that we pooled all the years together (thus neglecting any year effect). The time is expressed in number of minutes since previous noon. The temperature is expressed in Celsius degrees.   It comes as no surprise that this relationship is modal with maximum (i.e. latest surveys in the night) around 180, which corresponds to June-July and the longest days of the year. We call "time" in subsequent analysis the residuals obtained from this model. By doing so, our "time" variable actually measures a time while controlling for the date.
We also found that temperature depends on both date and (residual) time ( Figure SX.3  Here again results are rather intuitive. We find a modal relationship of temperature with respect to date, with maximum around 210, which corresponds to July-August. We find a negative effect of residual time: the later in the night, the colder. We call "temperature" in subsequent analysis the residuals obtained from this model. By doing so, our "temperature" variable actually measures a temperature while controlling for both date and time.

Hollow covariates
Hollows are described using two variables: (i) the area of the entrance (approximated using height x breadth, expressed in squared centimeters) and (ii) height of the entrance on the trunk (expressed in meters). Trees in the area of study could harbour several hollows ( Figure S2.4). Hollows belonging to the same tree are aggregated together, considering the total area (summing across hollows) and the mean height. We thus obtain two covariates depicting hollow features at the tree level: the "mean hollow height" ( The histogram of the total hollow area in trees is very skewed ( Figure S2.6), which would potentially have generated statistical issues in subsequent analyses. We therefore log-transformed the total hollow area ( Figure S2.7). From now, we call "Total hollow area" in subsequent analysis this log transformed variable.

Number of hollows
Log total hollow area In particular, there was no significant correlation between the total hollow area and the mean hollow height according to a Kendall correlation test: ## ## Kendall's rank correlation tau ## ## data: mHH and sHA ## z = 0.23056, p-value = 0.8177 ## alternative hypothesis: true tau is not equal to 0 ## sample estimates: ## tau ## 0.00902062

Other tree features
In addition to hollow features mentioned above, we also included tree girth (in centimeters), tree living status (alive=1, dead=0) and sun exposure (qualitative assessment from 0 to 2).

Girth
Tree girth was related to mean hollow height and total hole area ( Figure S2.8)  These results are not surprising, they basically mean that bigger trees tend to harbour bigger holes, which stems from both a geometric constraint and an age effect of the tree. : This rather tight relationship may generate issues when using both girth and hollow features in the modeling of numbers of observed individuals. We therefore choose to keep only the residuals of the model presented above as our "girth" variable, that is the part of tree girth that is independent from hollow features. It may be seen as a measure of tree girth while controlling for the hollow features.

Living status
We did the same kind of analysis for living status as for girth, exploring potential relationships with hollow features through the following generalized linear model: We found that trees where holes were larger and closer to the ground were more likely to be dead. Here again we focused on the residual effect of the living status when controling for hollow features by considering the working residuals of this model as our living status variable.

Sun exposure
Sun exposure did not show any strong relationship with the hollow features, as shown by the linear model: We computed the log-likelihood of the positive count data under a full geometric model (one parameter per observation) considering only surveys with no missing data in covariates, which equaled -466.7989054. The likelihood of the selected geometric model equaled -740.3154274, which corresponds to a deviance of 547.033044 when compared to the full geometric model. The null geometric model (one parameter only, used for all the surveys) yielded the log-likelihood -777.4303024, which corresponds to a deviance of 621.262794. We could therefore compute the Mac Fadden R 2 of the model (defined as 1 − ModelDeviance/NullDeviance), which equaled 0.119482.

Goodness of fit
We explored the goodness of fit of the selected geometric model by simulating 1000 virtual datasets using maximum likelihood parameter estimates provided above. For each simulated dataset, we computed the new maximum likelihood estimates, and reported corresponding log-likelihood ( Figure S2.9). We observed that likelihood of real data did not significantly differ from that of simulations, suggesting an adequate fit of the model to the positive count data.

Predicting detectability for all surveys
We derived the prediction of our selected model for all the surveys of our dataset (whether individuals were detected or not). When data was missing for either mean hollow height, total hollow area or tree girth (see figure legends above for numbers of trees involved), we filled the data set using the average across all the trees where data was available. Similarly, the number of surveys where time was missing equaled 8. We filled these missing (residual) time data using the average across all surveys where the residual time was available. We could then predict the parameter g of the geometric distribution for each tree and each survey, even those without observed individuals ( Figure S2.9), and the corresponding distribution of detectability (φ = 1 − g) at survey scale is provided in Figure 1A of main text.

Predicting capacity parameters (K i ) for all trees
We derived the K i parameters by simply deriving our selected geometric model prediction for each tree while artificially putting the (residual) time covariate at 0 ( Figure S2.10). This provided us with g i , which we could transform to K i using equation 7 of main text.