A Gibbs sampler for multi-species occupancy models

Clark, Allan Ernest; Altwegg, Res

doi:10.1007/s10651-023-00558-7

A Gibbs sampler for multi-species occupancy models

Open access
Published: 30 May 2023

Volume 30, pages 189–204, (2023)
Cite this article

Download PDF

You have full access to this open access article

Environmental and Ecological Statistics Aims and scope Submit manuscript

A Gibbs sampler for multi-species occupancy models

Download PDF

Allan Ernest Clark^1,2 &
Res Altwegg^1,2

1349 Accesses
Explore all metrics

Abstract

Multi-species occupancy (MSO) models use detection-nondetection data from species observed at different locations to estimate the probability that a particular species occupies a particular geographical region. The models are particularly useful for estimating the occupancy probabilities associated with rare species since they are seldom observed when undertaking field surveys. In this paper, we develop Gibbs sampling algorithms that can be used to fit various Bayesian MSO models to detection-nondetection data. Bayesian analysis of these models can be undertaken using statistical packages such as JAGS, Stan, and NIMBLE. However, since these packages were not developed specifically to fit occupancy models, one often experiences long run-times when undertaking analysis. However, we find that these packages that were not developed specifically to fit MSO models are less efficient than our special-purpose Gibbs sampling algorithms.

Neotropical Mammals and the Analysis of Occupancy and Abundance

Inhomogeneous Poisson point process for species distribution modelling: relative performance of methods accounting for sampling bias and imperfect detection

Article 11 May 2022

Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset

Article Open access 19 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Species richness (defined as the number of distinct species in a region) is a measure of biodiversity (Dorazio et al. 2006) and is important from a conservation and management point of view since ecologists and nature conservationists use these estimates to prioritise conservation action (Zipkin et al. 2009; Goijman et al. 2015). In most cases, species richness in an area is unknown and has to be estimated using observed data. Such data are collected by sampling a region a finite number of times. Some species might be easily observed, while others might be more difficult to observe, thus making it more difficult to draw inferences about the community structure as a whole. Multi-species occupancy (MSO) models account for variable detection probabilities amongst species and are used to investigate the community structure of a group of species (Dorazio and Royle 2005; Dorazio et al. 2006, 2011; Broms et al. 2016).

MSO models use detection-nondetection data from several species observed at different locations to estimate the probability that a species occupies a location. The models are particularly useful for estimating the occupancy probabilities associated with rare species (Devarajan et al. 2020). The observed detections of these species are usually low which results in imprecise occupancy probabilities if single species occupancy models are fit to such data. MSO models utilize random effects and allow the parameters associated with the regression effects of the model to be shared. This allows one to estimate detection and occupancy regression effects for species that would otherwise not be estimable (Tobler et al. 2015; Broms et al. 2016).

MSO models are mainly used to obtain estimates of species richness in a certain geographical region but can also be used to compare the biodiversity of different regions (Zipkin et al. 2010; Sauer et al. 2013). In certain geographical regions, species richness is known or well-understood (e.g. in well-studied areas) in which case these models are also used to uncover the biological drivers of biodiversity in a region (Zipkin et al. 2009; Ruiz-Gutiérrez et al. 2010; Drouilly et al. 2018).

Two formulations of the model have been developed in the literature. The first version assumes that we know the species richness in a certain geographical region (Broms et al. 2016), and it is assumed that all species of interest have been observed at least once. The second version of the model assumes that species richness is an unknown random variable where the aim is to estimate the number of unobserved species (Dorazio and Royle 2005). Both model formulations are generally undertaken using Bayesian methods. They have extensively been used, and computer code to fit these models are freely available online (Dorazio et al. 2006; Broms et al. 2016).

In the literature to date, the modelling process is mostly undertaken using statistical packages such as WinBUGS (Lunn et al. 2000), OpenBUGS (Thomas et al. 2006), JAGS (Plummer 2003), NIMBLE (de Valpine et al. 2017) and Stan (Carpenter et al. 2017). These packages allow the user to apply Bayesian methods to many different application areas and have not been developed specifically to undertake MSO models. WinBUGS, OpenBUGS and JAGS use a combination of Gibbs sampling (Geman and Geman 1984), Metropolis-Hastings (denoted as ‘MH’) steps (Metropolis et al. 1953; Hastings 1970) as well as other custom Markov Chain Monte Carlo (MCMC) samplers (Brooks et al. 2011) to obtain posterior samples from the posterior distribution of the parameters of a Bayesian model. NIMBLE is flexible and allows the user to specify the sampling algorithms that should be used when undertaking posterior sampling. The above software packages with the exception of Stan, allow the user to sample from both discrete and continuous conditional distributions. Stan in contrast only allows posterior sampling from continuous random variables which are undertaken using the No-U-turn Hamiltonian Monte Carlo sampler (Hoffman and Gelman 2014).

Below we develop Gibbs sampling algorithms that can be used to fit MSO models. The Gibbs sampler is a MCMC method that is used to obtain samples from the posterior distribution of the parameters of a statistical model. The method can be used when all of the conditional posterior distributions of the parameters of a model are of a known form and sampling from the conditional posterior distributions are straightforward. The Gibbs sampler is known to exhibit slow mixing in certain cases, although this shortcoming is often overcome by sampling groups of parameters in a block which improves mixing (Roberts and Sahu 1997). In a single-season (single-species) nonspatial and spatial occupancy modelling context, Clark and Altwegg (2019), show that special-purpose Gibbs samplers can produce posterior chains that mix faster and have larger expected sampling rates (Holmes and Held 2006) than those obtained using JAGS and Stan. These results suggest that such algorithms could potentially lead to significant reductions in the run-times of MSO models. Doser et al. (2021) extended Clark and Altwegg (2019) and developed algorithms to fit various Bayesian occupancy models. As one of their developments, they assume that the spatial effects in the occupancy process of a single-season, single-species occupancy model can be modelled through a Gaussian process prior distribution with a Matérn covariance function. They found that for data sets with more than 1000 sites, their algorithm was prohibitively slow, in which case they used a nearest-neighbourhood Gaussian process prior distribution (Datta et al. 2016) to significantly reduce run-times when analysing large detection-nondetection data sets.

Dorazio and Rodriguez (2012) and Johnson et al. (2013) developed Gibbs algorithms to obtain posterior samples for the parameters of a nonspatial and spatial single-species occupancy model, respectively. Both approaches assume that detection and occupancy processes are modelled using probit link functions, which enables the use of data augmentation (Tanner and Wong 1987) to obtain closed form expressions of the conditional posterior distributions of the parameters of an occupancy model. Below we develop Gibbs sampling algorithms to undertake MSO models when species richness is known, and thereafter we describe how the algorithm would change when species richness is unknown. Initially, we assume that the regression effects of the detection and occupancy processes are modelled using probit link functions and after that we extend the analysis to the use of logistic link functions. We do so since we observe that, in certain model formulations, the use of probit link functions does not lead to a simple Gibbs sampling algorithm when sampling from the posterior distribution of the parameters of an MSO model. In these cases, MH steps, which require the monitoring of various additional tuning parameters, could be used. Another reason for using the logistic link function is that it allows one to more easily interpret the regression effects of the model compared to the probit model. Take note that below we do not discuss joint species distribution models (Pollock et al. 2014; Tikhonov et al. 2020) and restrict the focus to occupancy modelling. A detailed review of MSO and its use can be found in Devarajan et al. (2020).

The paper commences by discussing the Bayesian formulation of an MSO model when species richness is assumed to be known in a geographical region, while in Sect. 3, we develop a Gibbs sampler for the MSO model when species richness is unknown. In both cases, it is assumed that the detection and occupancy processes are modelled using probit link functions. In Sect. 4, we extend the previous work and consider the case when the detection and occupancy processes are modelled using logistic functions. We describe how the Gibbs algorithms are constructed and briefly comment on how MSO models can be applied to different taxa or species groups. Before concluding, we re-analyse the data collected from a camera-trapping study undertaken by Drouilly et al. (2018) in order to highlight the use of the methods developed as well.

2 Known species richness: probit link function

The following formulation of the MSO model is similar to Broms et al. (2016).^{Footnote 1} It is assumed that each site (J in total) can be surveyed numerous times, potentially with unequal survey effort, and that $n_s$ species have been observed in total. The observed data is stored in a three-dimensional ragged-array $\varvec{y}$ where $y_{i,j,k}$ represents the detection-nondetection of species i, at site j during survey k ($i=1, \ldots ,n_{s}; \;j=1, \ldots , J;\; k=1, \ldots , K_j$). The random variable $y_{i,j,k}=1$ if species i is observed at site j during visit k and 0 otherwise. The partially observed occupancy states are stored in a matrix $\varvec{z}$ where $z_{i,j}=1$ if species i occupies site j while $z_{i,j}=0$ if species i does not occupy site j. We assume that the occupancy process can be modelled using site-specific covariates ($\varvec{X}$) while the detection process can be modelled using survey-specific covariates ($\varvec{V}$) where it is assumed that these covariates do not vary by species.

The model can now be specified using the following Bayesian hierarchical model,

$$\begin{aligned} z_{i,j}\vert \psi _{i,j}&\sim \text {Bernoulli}(\psi _{i,j}), \; i=1, \ldots ,n_s; \;\; j=1,\ldots ,J, \\ y_{i,j,k}\vert z_{i,j},p_{i,j,k}&\sim \text {Bernoulli}(z_{i,j}p_{i,j,k}), k=1, \ldots , K_j, \end{aligned}$$

where occupancy and detection probabilities are modelled using probit link functions, $\psi _{i,j} = \Phi (\varvec{x}_j^T \varvec{\beta }_i)$ and $p_{i,j,k} = \Phi (\varvec{v}_{j,k}^T \varvec{\alpha }_i)\; \forall (i,j,k)$. The Bayesian formulation is completed by specifying the prior distributions used. The regression effects of the detection and occupancy processes are modelled using multivariate Gaussian distributions and we specifically assume that

$$\begin{aligned} \varvec{\alpha }_i \vert \varvec{\mu }_{\varvec{\alpha }},\sigma ^2_{\varvec{\alpha }}&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\alpha }}, \sigma ^2_{\varvec{\alpha }} \varvec{I}_{n_d})\; \text {with}\; \varvec{\mu }_{\varvec{\alpha }} \sim \mathcal {N}(\varvec{0}, a^2 \varvec{I}_{n_d}) \; \text {and}\\ \varvec{\beta }_i \vert \varvec{\mu }_{\varvec{\beta }},\sigma ^2_{\varvec{\beta }}&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }}, \sigma ^2_{\varvec{\beta }} \varvec{I}_{n_o}) \; \text {with}\; \varvec{\mu }_{\varvec{\beta }} \sim \mathcal {N}(\varvec{0}, b^2 \varvec{I}_{n_o}). \end{aligned}$$

The coefficient vectors $\varvec{\mu }_{\varvec{\alpha }}$ and $\varvec{\mu }_{\varvec{\beta }}$ represent the community-level effects associated with the detection and occupancy processes respectively, and are each assigned a common Gaussian hyper-prior distribution. By doing so, one allows the regression effects to share information since often the detection and occupancy probabilities are not estimable for rare animal species due to lack of detections (Tobler et al. 2015).

Finally, $\sigma _{\varvec{\alpha }}$ and $\sigma _{\varvec{\beta }}$ are modelled using a half-Cauchy distribution (Gelman 2006) and can be specified hierarchically as

$$\begin{aligned} \sigma _{\varvec{\alpha }}^2\vert \nabla _{\varvec{\alpha }} \sim \text {IG}\left( 0.5, \frac{1}{\nabla _{\varvec{\alpha }}}\right) ,\;&\nabla _{\varvec{\alpha }} \sim \text {IG}\left( 0.5, \frac{1}{A^2} \right) \; \text {and},\end{aligned}$$

(1)

$$\begin{aligned} \sigma _{\varvec{\beta }}^2\vert \nabla _{\varvec{\beta }} \sim \text {IG}\left( 0.5, \frac{1}{\nabla _{\varvec{\beta }}}\right) ,\;&\nabla _{\varvec{\beta }} \sim \text {IG}\left( 0.5, \frac{1}{B^2} \right) , \end{aligned}$$

(2)

where A and B are known hyper-parameters.

Posterior samples can be obtained from the above MSO model by using a MH algorithm, although we instead derive a Gibbs algorithm that uses the data augmentation scheme developed by Albert and Chib (1993) to represent a probit link function by means of latent Gaussian random variables. The full conditional distributions needed to undertake a Gibbs sampler are all of a familiar form and are shown in the Supplementary Information (refer to Appendix 1.A.1).^{Footnote 2} In Supplementary Appendix 1.A.2 we also show how similar conditional posterior distributions can be obtained if site-specific covariates are available but no survey-specific covariates are available for the detection process. In this case, site-specific covariates are used as proxies for the survey-specific covariates.

3 Unknown species richness: probit link function

Dorazio and Royle (2005) developed a model that could be used to estimate the species richness (denoted as N) in a geographical region. They assume that there exists a super-population of species (denoted as S) that consists of the observed species, $n_s$, as well as additional unseen species, $S-n_s$. Recall that in Sect. 2, the survey occasion was explicitly included in the model formulation. Here all information relating to survey occasions are grouped together such that the observed data $\varvec{y}_{n_s}$ consists of a $n_s \times J$ matrix^{Footnote 3} of the total number of detections for each species at each location (from $K_j$ surveys, $j=1, \ldots , J$). Since N is unknown, an $(S-n_s) \times J$ matrix of zeros are introduced, which represents the data associated with the unobserved species.

A latent indicator variable $w_i$ is introduced, which takes on the value 1 if species i in the super-population occurs in the geographical region under investigation and 0 otherwise. From the above discussion, $w_i=1\; \text {for} \; i=1, \ldots ,n_s$ and the species richness in a geographical region is $N=\sum _{i=1}^S w_i$. The latent variable $z_{i,j}$ takes on the value 1 if species i occupies location j and 0 otherwise. The $w_i$ indicator is modelled using a Bernoulli distribution with success parameter $\Omega$ while $z_{i,j}\vert w_i$ is modelled as a Bernoulli random variable with success probability, $w_i \psi _{i,j}$. The detection process (conditional on $w_i=1, z_{i,j}=1$) is modelled using a binomial distribution such that

$$\begin{aligned} p(y_{i,j}\vert z_{i,j}=1, p_{i,j}, w_i=1)&= {{K_j}\atopwithdelims (){y_{i,j}}} p_{i,j}^{y_{i,j}} (1-p_{i,j})^{K_j - y_{i,j}}. \end{aligned}$$

(3)

Using the above description it is clear that the model can be formulated hierarchically as

$$\begin{aligned} w_i&\sim \text {Bernoulli}(\Omega ),\; \forall i=1,\ldots ,S,\\ z_{i,j}\vert w_i, \psi _{i,j}&\sim \text {Bernoulli}(w_i\psi _{i,j}), \; \forall i=1, \ldots ,S; \;\; \forall j=1,\ldots ,J, \\ y_{i,j}\vert z_{i,j},w_i, p_{i,j}&\sim \text {Binomial}(K_j, \; w_i z_{i,j}p_{i,j}), \\ p_{i,j}&= \Phi (\varvec{v}_{j}^T \varvec{\alpha }_i) \; \text {with}\; \varvec{\alpha }_i \sim \mathcal {N}(\varvec{\mu }_{\varvec{\alpha }}, \sigma ^2_{\varvec{\alpha }} \varvec{I}_{n_d}),\\ \psi _{i,j}&= \Phi (\varvec{x}_j^T \varvec{\beta }_i) \; \text {with}\; \varvec{\beta }_i \sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }}, \sigma ^2_{\varvec{\beta }} \varvec{I}_{n_o}), \end{aligned}$$

where $\varvec{v}_j$ and $\varvec{x}_j$ variables are site-specific covariates used to model the detection and occupancy processes respectively.

Since the detection process is a binomial distribution, the use of the data augmentation method developed previously to sample from the detection regression effects is more cumbersome to use since the number of latent variables required to perform the data augmentation step increases from 1 to $K_j$ for each i. The remaining parameters however have simple conditional posterior distributions if we use data augmentation to obtain the posterior distribution of the occupancy regression coefficients. A derivation of the resulting conditional posterior distributions can be found in Supplementary Appendix 1.B.

4 The use of logistic link functions in multi-species occupancy models

In Supplementary Appendices 1.C and 1.D we show that if logistic link functions are used to model the regression effects of the detection and occupancy processes, a Gibbs sampling algorithm with closed-form conditional posterior distributions for all parameters of the MSO models in Sects. 2 and 3, can be derived. A Gibbs sampler is preferable to the use of a MH algorithm since the former does not require any ‘tuning’ when undertaking the MCMC sampling.

As an example, consider the MSO model developed in Sect. 3 but assume that the regression effects for detection and occupancy are modelled as

$$\begin{aligned} \text {logit}(p_{i,j})&= \varvec{v}_{j}^T \varvec{\alpha }_i \, \text {where}\, \varvec{\alpha}_i \sim \mathcal {N}\bigg (\varvec{\mu }_{\varvec{\alpha }}, \sigma ^2_{\varvec{\alpha }} \varvec{I}_{n_d}\bigg ),\\ \text {logit}(\psi _{i,j})&= \varvec{x}_j^T \varvec{\beta }_i \, \text {where}\, \varvec{\beta}_i \sim \mathcal {N}\bigg (\varvec{\mu }_{\varvec{\beta }}, \sigma ^2_{\varvec{\beta }} \varvec{I}_{n_o}\bigg ), \end{aligned}$$

and both $\varvec{\mu }_{\varvec{\alpha }}$ and $\varvec{\mu }_{\varvec{\beta }}$ are modelled hierarchically as done previously. The conditional posterior distributions of most of the parameters remain the same as those defined in Sect. 3 although the conditional posterior distributions of $\varvec{\alpha }_i\vert .$ and $\varvec{\beta }_i\vert .$ are slightly amended as detailed below. When $w_i=0$, $\varvec{\beta }_i\vert . \sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }}, \sigma _{\varvec{\beta }}^2 \varvec{I}_{n_o})$, while when $w_i=1$, posterior samples of $\varvec{\beta }_i$ can be obtained by sampling from a Pólya-Gamma distribution and Gaussian distribution in turn. Specifically,

$$\begin{aligned} \varvec{\omega }_{i,j}^{(\varvec{\beta })}\vert \varvec{\beta }, w_i=1,.&\sim \text {PG}(1, \varvec{x}_j^T \varvec{\beta }_i), \, \forall j=1, \ldots , J\, \text {and}\\ \varvec{\beta }_i\vert \varvec{y}, \varvec{\omega }_i^{(\varvec{\beta })}, w_i=1,.&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }_i}, \, \varvec{\Sigma }_{\varvec{\beta }_i}), \end{aligned}$$

where the ‘PG’notation denotes a Pólya-Gamma random variable, $\varvec{\omega }_i^{(\varvec{\beta })}= \left[ \omega _{i,1}^{(\varvec{\beta })}, \ldots , \omega _{i,J}^{(\varvec{\beta })}\right] ^T$, $\tau _{\varvec{\beta }}=\frac{1}{\sigma ^2_{\pmb \beta }}$ and

$$\begin{aligned} \varvec{\Sigma }_{\varvec{\beta }_i}&= \left( \tau _{\varvec{\beta }} \varvec{I}_{n_o} + \varvec{X}^T \varvec{D}_{\varvec{\beta }} \varvec{X}\right) ^{-1}, \\ \varvec{\mu }_{\varvec{\beta }_i}&= \varvec{\Sigma }_{\varvec{\beta }_i} \left( \varvec{X}^T \varvec{\Theta }_{\varvec{\beta }} + \tau _{\varvec{\beta }} \varvec{\mu }_{\varvec{\beta }}\right) . \end{aligned}$$

The matrices $\varvec{D}_{\varvec{\beta }}$ and $\varvec{\Theta }_{\varvec{\beta }}$ are defined as, $\varvec{D}_{\varvec{\beta }}= \text {diagonal}\left[ \omega _{i,1}^{(\varvec{\beta })}, \ldots , \omega _{i,J}^{(\varvec{\beta })}\right]$ and $\varvec{\Theta }_{\varvec{\beta }} = \left[ z_{i,1}-1/2, \ldots , z_{i,J}-1/2\right] ^T$. The conditional posterior distribution of $\varvec{\alpha }_i\vert .$ can be obtained similarly such that

$$\begin{aligned} \varvec{\omega }_{i,j}^{(\varvec{\alpha })}\vert \varvec{\alpha }, w_i=1,.&\sim \text {PG}(K_j, \varvec{x}_j^T \varvec{\alpha }_i), \, \forall j=1, \ldots , j^*,\\ \varvec{\alpha }_i\vert \varvec{y}, \varvec{\omega }_i^{(\varvec{\alpha })} , w_i=1,.&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\alpha }_i}, \, \varvec{\Sigma }_{\varvec{\alpha }_i}), \end{aligned}$$

where $\varvec{\omega }_i^{(\varvec{\alpha })}= \left[ \omega _{i,1}^{(\varvec{\alpha })}, \ldots , \omega _{i,j^*}^{(\varvec{\alpha })}\right] ^T$, $\tau _{\varvec{\alpha }}=\frac{1}{\sigma ^2_{\pmb \alpha }}$ and

$$\begin{aligned} \varvec{\Sigma }_{\varvec{\alpha }_i}&= \left( \tau _{\varvec{\alpha }} \varvec{I}_{n_d} + \tilde{\varvec{V}}^T \varvec{D}_{\varvec{\alpha }} \tilde{\varvec{V}}\right) ^{-1} , \\ \varvec{\mu }_{\varvec{\alpha }_i}&= \varvec{\Sigma }_{\varvec{\alpha }_i} \left( \tilde{\varvec{V}}^T \varvec{\Theta }_{\varvec{\alpha }} + \tau _{\varvec{\alpha }}\varvec{\mu }_{\varvec{\alpha }}\right) , \end{aligned}$$

where $\tilde{\varvec{V}}$ is defined as the elements of $\varvec{V}$ associated with $z_{i,j}=1$ while the matrices $\varvec{D}_{\varvec{\alpha }}$ and $\varvec{\Theta }_{\varvec{\alpha }}$ are defined as, $\varvec{D}_{\varvec{\alpha }}= \text {diagonal}\left[ \omega _{i,1}^{(\varvec{\alpha })}, \ldots , \omega _{i,j^*}^{(\varvec{\alpha })}\right]$ and $\varvec{\Theta }_{\varvec{\alpha }} = \left[ y_{i,1}-K_1/2, \ldots , y_{i,j^*}-K_{j^*}/2\right] ^T$ for an appropriate value of $j^*$. Complete derivations of the conditional posterior distributions of the MSO model can be found in Supplementary Appendix 1.C while Supplementary Appendix 1.D contains the required conditional distributions when species richness is assumed known. In the above formulations of the MSO model, it is assumed that both the detection and occupancy regression effects are modelled as realisations from a common Gaussian distribution. These models can easily be extended by adding one additional level of complexity and assuming that the regression effects of different groups (e.g. bird guilds, taxa or species groups) are modelled as realisations from different Gaussian distributions. The required algorithm is included in Supplementary Appendix 1.D.2. The above model formulations can easily be extended to account for a Bernoulli detection process or the inclusion of spatial random effects in either the detection or occupancy processes.

5 Application

As an application of the methods developed, we re-analyse the data collected from a camera-trapping study undertaken by Drouilly et al. (2018). The Anysberg Nature Reserve (33$^{\circ }$ 31$^\prime$ S, 20$^{\circ }$ 37$^\prime$ E) in South Africa (referred to as Anysberg) is the study area. The camera trapping design consisted of 156 camera trap sites (deployed from the end of September 2013 to May 2014) each placed at a location within a 2 km grid design (Kinnaird and O’Brien 2012). Data were collected using Bushnell Trophy CAM HG (Bushnell Outdoor Products, Overland Park, Kansas) camera traps and consisted of the processing of camera images obtained when cameras were triggered in the field. The study specifically focused on terrestrial vertebrates $> 0.5$ kg in mass. A detailed explanation of camera trap survey design can be found in Sect. 2.2 of Drouilly et al. (2018). A list of the species observed at least once at Anysberg during the course of the study can be found in Supplementary Appendix 1.E.

Since the number of detections obtained daily was low, it was decided to define a sampling occasion as a 6 day period. From the captured images an observation matrix was constructed which consisted of the observed number of times species i (for $i=1, \ldots , n_s=35$) was detected at site j ($j=1, \ldots , J=156$) during $K_j$ sampling occasions. The observed data is stored as $\left[ y_{i,j}\right]$.

Below we use the observed species at Anysberg and relate the occupancy and detection processes to explanatory variables. We assume that species richness is known and fit the MSO model described in Sect. 2 (assuming logistic link functions) to the data. The occupancy process is modelled as a function of elevation (measured as metres above sea-level), the modified soil-adjusted index (MSAVI2) (Qi et al. 1994) and a PreyIndex.^{Footnote 4} MSAVI2 is a vegetation index similar to the normalised difference vegetation index. MSAVI2 was converted into a binary indicator variable (named MSAVI2ind) such that MSAVI2ind equals 1 if the standardised variable MSAVI2 is positive. The detection process is modelled as a function of a binary indicator variable, trail and a habitat factor variable. The trail variable equals 1 when a camera is placed on a trail and 0 otherwise. The habitat factor variable has three levels, namely, plain, river and mountain, where the mountain level is used as the baseline.

We fit all model combinations (32 in total, see Supplementary Appendix 1.F) and assume that all models include an intercept in the detection and occupancy process. We further assume that we are in a $\mathcal {M}$-complete setting (Bernardo and Smith 1994) and acknowledge that other variables might well be the true drivers of both the occupancy and detection process but choose to use the ones listed above. All MCMC sampling was undertaken using our code. For all models, three chains of 100,000 draws were used with a burn-in proportion of one-third. The following hyper-parameters were used: $a^2=2.25^2=b^2$ and $A^2=2.25^2=B^2$. MSO models produce significant amounts of outputs and are computationally expensive. Because of this, we thinned the resulting MCMC chains using a factor of ten. Various convergence tests^{Footnote 5} were undertaken to assess that the MCMC chains had converged (Smith 2007).

Exploratory (with-in sample) model selection was initially undertaken by first considering the WAIC (Watanabe 2010) and the approximate leave-one-out cross-validation scores (LOO) using Pareto smoothed importance sampling (Vehtari et al. 2017) for each of the candidate models. In order to assess the variability of the model selection statistics, ten independent MCMC runs were undertaken, and two goodness-of-fit statistics, namely the Bayesian p-value (Meng 1994) and the sampled Bayesian p-value (Zhang 2014) were also calculated. Gosselin (2011) shows that, asymptotically, the distribution of the sampled Bayesian p-value is uniformly distributed over the interval [0, 1] which suggests that misspecified models will not have uniformly distributed sample Bayesian p-values. The Bayesian posterior probability of each of the models were calculated using the version of the Reversible jump MCMC (RJMCMC) algorithm developed by Barker and Link (2013), which were then used to undertake Bayesian model averaging (Hoeting et al. 1999). We assume a priori that each model is equally likely. Refer to Supplementary Appendix 1.G for more details regarding the above mentioned model selection methods.

5.1 Application results

Both model selection measures require the calculation of the variance of the logarithm of the integrated likelihood evaluated using the posterior samples at each MCMC iteration. Based on ten independent MCMC runs, we found that both model selection measures were not accurately estimated since the previous mentioned variance calculation was often greater than 0.4 thus rendering both model selection methods unreliable (Vehtari et al. 2017).

The variability of the Bayesian p-values was of an order of magnitude smaller than those observed for the sampled Bayesian p-values (see Supplementary Appendix 1.H). Anderson-Darling tests (Anderson and Darling 1954) using the sampled Bayesian p-values obtained from ten independent MCMC runs, indicating that the assumption that the sampled Bayesian p-values are uniformly distributed over the interval [0, 1] cannot be rejected for fifteen of the thirty two models (assuming a 30% significance level).^{Footnote 6} These models have been highlighted in grey in Fig. 1, which displays the boxplots of the sampled Bayesian p-values based on ten independent MCMC runs. The models and the p-values of the Anderson–Darling tests are tabulated in Table 1. Given the small sample size (of ten), these tests tentatively suggest that models that only include an intercept term in the occupancy process are misspecified. These results also suggest that none of the variables considered always lead to misspecified models and thus all variables do appear to have some predictive ability of occupancy for the species considered in this data set.

Table 1 The list of models associated with non-significant Anderson–Darling tests when testing the null hypothesis that the distribution of the sampled Bayesian p-values are uniformly distributed over the range [0, 1]; the respective p-value of the test as well as the median sampled Bayesian p-value of the associated models

Full size table

Conn et al. (2018) suggest that the Bayesian p-values should not be used to undertake model selection, but should rather be used to identify models that display clear signs of model misspecification. In our context, models 3, 4, 21, 25, 26, 29, and 30 appear to be misspecified since they either have a Bayesian p-value above 0.8 or are smaller than 0.2. These models either only include an intercept term in the detection or occupancy process; do not include the elevation variable in the occupancy process or only includes the trail variable in the detection process.

Table 2 The posterior model probabilities for the five best models obtained using RJMCMC

Full size table

The resulting posterior model probabilities from the RJMCMC algorithm are displayed in Table 2. The models listed contain a combination of the elevation and PreyIndex variable in the occupancy process as well as a combination of the indicator variables trail, plain and river in the detection process. From these results, we observed that Model 6 has the largest posterior probability with a value of 0.85, Model 8 has the second largest posterior probability with a value of 0.14 while all other models have very low posterior mass associated with them.

From an examination of the population regression effects in Table 4 of Supplementary Appendix 1.I it can be seen that the mean population regression effects for plain and river in the detection process (of Models 8 and 24) both have 95% credibility intervals that contain zero. Similarly, the PreyIndex variable in Models 22 and 24 also have 95% credibility intervals that contain zero. Table 3 contains the Bayesian model averaged population regression effects^{Footnote 7} as well as the posterior effect probabilities (the probability that a population regression effect is non-zero). The results clearly indicate that trail and elevation have non-zero regression effects in the detection and occupancy processes respectively while the remaining regression effects have a high probability of being zero. These results indicate that on average one is more likely to detect one of the animal species investigated off a trail than on it and that on average no significant differences in detectability of animal species are observed between the different habitat types. We found that the mean population regression effect for elevation is negative, indicating that on average higher occupancy probabilities are observed at lower elevation levels than at higher ones.

Table 3 The Bayesian model averaged community-level regression effects. (Posterior mean, Standard deviation as well as posterior effect probabilities.) Regression effects cannot be calculated if $\text {Pr}(\varvec{\beta }\not = \varvec{0} \vert \text {Data})=0$ and is captured as ‘–’

Full size table

Figure 2 displays caterpillar plots highlighting the mean community regression effects as well as the 95% credibility intervals for the species-specific regression effects of trail (a) and elevation (b) for the detection and occupancy processes respectively. Of the 35 species photographed, the occupancy probability was strongly related to elevation for 19 species. The results indicate that Smith’s rock hare (Pronolagus rupestris), klipspringers (Oreotragus oreotragus) and leopard’s (Panthera pardus) were more likely to occupy high lying regions of the reserve compared to many of the antelope species that preferred low-lying regions of the reserve. In general, the detection of most carnivore species were negatively related to trail while the detection of species such as the hyrax (Procavia capensis) and the duiker (Sylvicapra grimmia) were positively related to trail. Plots displaying the estimated mean occupancy probability of the different species can be found in Supplementary Appendix 1.J.

We found that the above results were not sensitive to the specification of the prior distribution. To this end, in Supplementary Appendix 1.K we display statistics pertaining to the community-level regression effects for Model 6 when run using three chains of 100,000 in length. A burn-in sample of one-third is specified. Thereafter the resulting chains were thinned by retaining every 10th observation. The different prior distributions were specified as follows: Prior 1: $a^2=2.25^2=b^2$, $A^2=2.25^2=B^2$; Prior 2: $a^2=2.25^2=b^2$, $A^2=2.25=B^2$; Prior 3: $a^2=5^2=b^2$, $A^2=5^2=B^2$; Prior 4: $a^2=5^2=b^2$, $A^2=5=B^2$.

6 Discussion and conclusions

Multi-species occupancy models are important tools for statistical ecologists since they allow one to undertake community-level as well as species-specific inference by using detection-nondetection data of many species at different locations (Dorazio and Royle 2005; Dorazio et al. 2006; Broms et al. 2016). The information across species are pooled together, which allows one to obtain occupancy probabilities of rare species, which would otherwise not be possible due to the low detectability of these species.

Here, we develop Gibbs sampling algorithms when probit or logistic link functions are used to model the detection and occupancy processes respectively, and show how this can be done for a number of different versions of the MSO models. We also specifically show that the algorithms for MSO models (that use a logistic link function) are similar to those developed for single-season occupancy models as well as single-season spatial occupancy models where all algorithms use the work of Polson et al. (2013) and Clark and Altwegg (2019) to develop efficient Gibbs sampling algorithms.

In our application, we found that model selection methods such as WAIC (Watanabe 2010) and the approximate leave-one-out cross-validation scores using Pareto smoothed importance sampling (Vehtari et al. 2017) were unreliable for the data considered, a possible reason for this being the sparsity of data at several sites. Model selection was undertaken using Reversible jump MCMC (Barker and Link 2013), while the occupancy surfaces for all species considered were obtained using Bayesian model averaging (see Supplementary Appendix 1.J).

A single MSO model took approximately 12–13 min to fit compared to approximately 4 h when using Stan. All calculations were performed using R 4.0.4 (Core 2014) on a Windows 10 desktop with a AMD Ryzen 7 3700X 8-Core Processor, a clock speed of 3.59 GHz and 64 GB of RAM. On the same system, the RJMCMC algorithm took approximately one hour to complete. These run-times were not found to be prohibitively long, and thus we would encourage others to consider using the technique when undertaking model selection. The authors used their own code to undertake the RJMCMC algorithm, although the R package rjmcmc (Gelling et al. 2019) could also be used.

Data availability

Data and code can be shared on request.

Notes

$n_s=$ the number of observed species; $J=$ the number of sites; $K_j=$ the number of surveys undertaken at site j; $\sum _j=K_j=$ the total surveys undertaken; $n_d=$ the number of detection covariates (including an intercept); $n_o=$ the number of occupancy covariates (including an intercept); $\varvec{X}$ is a $J \times n_o$ matrix; $\varvec{z}$ is a $n_s \times J$ matrix; $\varvec{V}$ is a $\left( \sum _{j=1}^{J} K_j \times J\right) \times n_d$ matrix.
All references to Appendices are appendices in the Supplementary Information.
Matrices and vectors are displayed using square braces and are bolded throughout the manuscript. Example an n dimensional vector is denoted as:
$$\begin{aligned} \varvec{m} = [m_1, \ldots , m_n]^T = \left[ \begin{matrix} m_1 \\ \vdots \\ m_n \end{matrix} \right] , \end{aligned}$$
The PreyIndex variable was constructed as the number of independent pictures for certain types of prey species considered in the study divided by the number of camera trap nights across the whole survey multiplied by 100. Pictures were assumed to be independent if they were taken more than 30 min apart.
Trace plots were examined were examined to informally assess the mixing of the three chains while Geweke and Brooks–Gelman–Rubin tests were performed to assess the burn-in sample size and assess convergence of the Markov chains respectively.
A liberal significance level was selected since the sample size is very small.
From Raftery (1993), $\hat{\triangle }_k = \mathbb {E}(\triangle _k\vert \mathcal {M}_k, \text {Data})$ such that $\mathbb {E}(\triangle \vert \text {Data})= \sum _k \mathbb {E}(\triangle _k\vert \mathcal {M}_k)\text {Pr}(\mathcal {M}_k\vert \text {Data})$ and $\text {var}(\triangle \vert \text {Data}) =\left( \sum _k \left( \text {var}(\triangle \vert \mathcal {M}_k, \text {Data}) - \hat{\triangle }_k^2 \right) \text {Pr}(\mathcal {M}_k\vert \text {Data})\right) - \mathbb {E}(\triangle _k\vert \mathcal {M}_k, \text {Data})^2$. The posterior effect probabilities are obtained by summing the posterior model probabilities across models for each variable (Hoeting et al. 1999).

References

Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88(422):669–679
Article Google Scholar
Anderson TW, Darling DA (1954) A test of goodness of fit. J Am Stat Assoc 49(268):765–769
Article Google Scholar
Barker RJ, Link WA (2013) Bayesian multimodel inference by RJMCMC: a Gibbs sampling approach. Am Stat 67(3):150–156
Article Google Scholar
Bernardo JM, Smith AF (1994) Bayesian theory, vol 405. Wiley, Hoboken
Book Google Scholar
Broms KM, Hooten MB, Fitzpatrick RM (2016) Model selection and assessment for multi-species occupancy models. Ecology 97(7):1759–1770
Article PubMed Google Scholar
Brooks S, Gelman A, Jones G, Meng X-L (2011) Handbook of Markov chain Monte Carlo. Chapman and Hall, Cambridge
Book Google Scholar
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1):1–32
Article PubMed PubMed Central Google Scholar
Clark AE, Altwegg R (2019) Efficient Bayesian analysis of occupancy models with logit link functions. Ecol Evol 9(2):756–768
Article PubMed PubMed Central Google Scholar
Conn PB, Johnson DS, Williams PJ, Melin SR, Hooten MB (2018) A guide to Bayesian model checking for ecologists. Ecol Monogr 88(4):526–542
Article Google Scholar
Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812
Article CAS PubMed PubMed Central Google Scholar
de Valpine P, Turek D, Paciorek C, Anderson-Bergman C, Temple Lang D, Bodik R (2017) Programming with models: writing statistical algorithms for general model structures with NIMBLE. J Comput Graph Stat 26(2):403–413
Article Google Scholar
Devarajan K, Morelli TL, Tenan S (2020) Multi-species occupancy models: review, roadmap, and recommendations. Ecography 43(11):1612–1624
Article Google Scholar
Dorazio RM, Rodriguez DT (2012) A Gibbs sampler for Bayesian analysis of site-occupancy data. Methods Ecol Evol 3(6):1093–1098
Article Google Scholar
Dorazio RM, Royle JA (2005) Estimating size and composition of biological communities by modeling the occurrence of species. J Am Stat Assoc 100(470):389–398
Article CAS Google Scholar
Dorazio RM, Royle JA, Söderström B, Glimskär A (2006) Estimating species richness and accumulation by modeling species occurrence and detectability. Ecology 87(4):842–854
Article PubMed Google Scholar
Dorazio RM, Gotelli NJ, Ellison AM (2011) Modern methods of estimating biodiversity from presence–absence surveys. Biodiversity loss in a changing planet. InTech, Rijeka, pp 277–302
Google Scholar
Doser JW, Finley AO, Kéry M, Zipkin EF (2021) spOccupancy: an R package for single species, multispecies, and integrated spatial occupancy models. Preprint at http://arxiv.org/abs/2111.12163
Drouilly M, Clark A, O’Riain MJ (2018) Multi-species occupancy modelling of mammal and ground bird communities in rangeland in the Karoo: a case for dryland systems globally. Biol Conserv 224:16–25
Article Google Scholar
Gelling N, Schofield MR, Barker RJ (2019) R package rjmcmc: reversible jump MCMC using post-processing. Aust N Z J Stat 61(2):189–212
Article Google Scholar
Gelman A (2006) Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal 1(3):515–534
Article Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741
Article CAS PubMed Google Scholar
Goijman AP, Conroy MJ, Bernardos JN, Zaccagnini ME (2015) Multiseason regional analysis of multi-species occupancy: implications for bird conservation in agricultural lands in east-central argentina. PLoS ONE 10(6):e0130874
Article PubMed PubMed Central Google Scholar
Gosselin F (2011) A new calibrated Bayesian internal goodness-of-fit method: sampled posterior p-values as simple and general p-values that allow double use of the data. PLoS ONE 6(3):e14770
Article CAS PubMed PubMed Central Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109
Article Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors). Stat Sci 14(4):382–417. https://doi.org/10.1214/ss/1009212519
Article Google Scholar
Hoffman MD, Gelman A (2014) The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15(1):1593–1623
Google Scholar
Holmes CC, Held L (2006) Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal 1(1):145–168
Google Scholar
Johnson DS, Conn PB, Hooten MB, Ray JC, Pond BA (2013) Spatial occupancy models for large data sets. Ecology 94(4):801–808
Article Google Scholar
Kinnaird MF, O’Brien TG (2012) Effects of private-land use, livestock management, and human tolerance on diversity, distribution, and abundance of large African mammals. Conserv Biol 26(6):1026–1039
Article PubMed Google Scholar
Lunn DJ, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS—a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 10(4):325–337
Article Google Scholar
Meng X-L (1994) Posterior predictive p-values. Ann Stat 22(3):1142–1160
Article Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092
Article CAS Google Scholar
Plummer M (2003) JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Hornik K, Leisch F Zeileis A (eds) Proceedings of the 3rd international workshop on distributed statistical computing. http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings
Pollock LJ, Tingley R, Morris WK, Golding N, O’Hara RB, Parris KM, Vesk PA, McCarthy MA (2014) Understanding co-occurrence by modelling species simultaneously with a joint species distribution model (JSDM). Methods Ecol Evol 5(5):397–406
Article Google Scholar
Polson NG, Scott JG, Windle J (2013) Bayesian inference for logistic models using Pólya-Gamma latent variables. J Am Stat Assoc 108(504):1339–1349
Article CAS Google Scholar
Qi J, Chehbouni A, Huerte A, Kerr Y, Sorooshian S (1994) A modified soil adjusted vegetation index. Remote Sens Environ 48(2):119–126
Article Google Scholar
R Core Team (2014) R: a language and environment for statistical computing [computer software manual]. http://www.R-project.org/
Raftery AE (1993) Bayesian model selection in structural equation models. Sage Focus Ed 154:163–163
Google Scholar
Roberts GO, Sahu SK (1997) Updating schemes, correlation structure, blocking and parameterization for the Gibbs sampler. J R Stat Soc Ser B (Stat Methodol) 59(2):291–317
Article Google Scholar
Ruiz-Gutiérrez V, Zipkin EF, Dhondt AA (2010) Occupancy dynamics in a tropical bird community: unexpectedly high forest use by birds classified as non-forest species. J Appl Ecol 47(3):621–630
Article Google Scholar
Sauer JR, Blank PJ, Zipkin EF, Fallon JE, Fallon FW (2013) Using multi-species occupancy models in structured decision making on managed lands. J Wildl Manag 77(1):117–127
Article Google Scholar
Smith BJ (2007) boa: an R package for MCMC output convergence assessment and posterior inference. J Stat Softw 21(11):1–37
Article Google Scholar
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. J Am Stat Assoc 82(398):528–540
Article Google Scholar
Thomas A, O’Hara B, Ligges U, Sturtz S (2006) Making BUGS open. R news 6(1):12–17
Google Scholar
Tikhonov G, Opedal ØH, Abrego N, Lehikoinen A, de Jonge MM, Oksanen J, Ovaskainen O (2020) Joint species distribution modelling with the R-package Hmsc. Methods Ecol Evol 11(3):442–447
Article PubMed PubMed Central Google Scholar
Tobler MW, Zúñiga Hartley A, Carrillo-Percastegui SE, Powell GV (2015) Spatiotemporal hierarchical modelling of species and occupancy using camera trap data. J Appl Ecol 52(2):413–421
Article Google Scholar
Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432
Article Google Scholar
Watanabe S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594
Google Scholar
Zhang JL (2014) Comparative investigation of three Bayesian p-values. Comput Stat Data Anal 79:277–291
Article Google Scholar
Zipkin EF, DeWan A, Andrew Royle J (2009) Impacts of forest fragmentation on species richness: a hierarchical approach to community modelling. J Appl Ecol 46(4):815–822
Article Google Scholar
Zipkin EF, Royle JA, Dawson DK, Bates S (2010) Multi-species occurrence models to evaluate the effects of conservation and management actions. Biol Conserv 143(2):479–484
Article Google Scholar

Download references

Funding

Open access funding provided by University of Cape Town.

Author information

Authors and Affiliations

Department of Statistical Sciences, University of Cape Town, Private Bag X3, Rondebosch, 7701, Cape Town, South Africa
Allan Ernest Clark & Res Altwegg
The Centre for Statistics in Ecology, Environment and Conservation, University of Cape Town, Cape Town, South Africa
Allan Ernest Clark & Res Altwegg

Authors

Allan Ernest Clark
View author publications
You can also search for this author in PubMed Google Scholar
Res Altwegg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Allan Ernest Clark.

Ethics declarations

Conflict of interest

We have no conflict of interest to declare.

Additional information

Handling Editor: Luiz Duczmal.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 728 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Clark, A.E., Altwegg, R. A Gibbs sampler for multi-species occupancy models. Environ Ecol Stat 30, 189–204 (2023). https://doi.org/10.1007/s10651-023-00558-7

Download citation

Received: 08 June 2022
Revised: 07 April 2023
Accepted: 21 April 2023
Published: 30 May 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s10651-023-00558-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Gibbs sampler for multi-species occupancy models

Abstract

Similar content being viewed by others

Neotropical Mammals and the Analysis of Occupancy and Abundance

Inhomogeneous Poisson point process for species distribution modelling: relative performance of methods accounting for sampling bias and imperfect detection

Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset

1 Introduction

2 Known species richness: probit link function

3 Unknown species richness: probit link function

4 The use of logistic link functions in multi-species occupancy models