1 Introduction

Species richness (defined as the number of distinct species in a region) is a measure of biodiversity (Dorazio et al. 2006) and is important from a conservation and management point of view since ecologists and nature conservationists use these estimates to prioritise conservation action (Zipkin et al. 2009; Goijman et al. 2015). In most cases, species richness in an area is unknown and has to be estimated using observed data. Such data are collected by sampling a region a finite number of times. Some species might be easily observed, while others might be more difficult to observe, thus making it more difficult to draw inferences about the community structure as a whole. Multi-species occupancy (MSO) models account for variable detection probabilities amongst species and are used to investigate the community structure of a group of species (Dorazio and Royle 2005; Dorazio et al. 2006, 2011; Broms et al. 2016).

MSO models use detection-nondetection data from several species observed at different locations to estimate the probability that a species occupies a location. The models are particularly useful for estimating the occupancy probabilities associated with rare species (Devarajan et al. 2020). The observed detections of these species are usually low which results in imprecise occupancy probabilities if single species occupancy models are fit to such data. MSO models utilize random effects and allow the parameters associated with the regression effects of the model to be shared. This allows one to estimate detection and occupancy regression effects for species that would otherwise not be estimable (Tobler et al. 2015; Broms et al. 2016).

MSO models are mainly used to obtain estimates of species richness in a certain geographical region but can also be used to compare the biodiversity of different regions (Zipkin et al. 2010; Sauer et al. 2013). In certain geographical regions, species richness is known or well-understood (e.g. in well-studied areas) in which case these models are also used to uncover the biological drivers of biodiversity in a region (Zipkin et al. 2009; Ruiz-Gutiérrez et al. 2010; Drouilly et al. 2018).

Two formulations of the model have been developed in the literature. The first version assumes that we know the species richness in a certain geographical region (Broms et al. 2016), and it is assumed that all species of interest have been observed at least once. The second version of the model assumes that species richness is an unknown random variable where the aim is to estimate the number of unobserved species (Dorazio and Royle 2005). Both model formulations are generally undertaken using Bayesian methods. They have extensively been used, and computer code to fit these models are freely available online (Dorazio et al. 2006; Broms et al. 2016).

In the literature to date, the modelling process is mostly undertaken using statistical packages such as WinBUGS (Lunn et al. 2000), OpenBUGS (Thomas et al. 2006), JAGS (Plummer 2003), NIMBLE (de Valpine et al. 2017) and Stan (Carpenter et al. 2017). These packages allow the user to apply Bayesian methods to many different application areas and have not been developed specifically to undertake MSO models. WinBUGS, OpenBUGS and JAGS use a combination of Gibbs sampling (Geman and Geman 1984), Metropolis-Hastings (denoted as ‘MH’) steps (Metropolis et al. 1953; Hastings 1970) as well as other custom Markov Chain Monte Carlo (MCMC) samplers (Brooks et al. 2011) to obtain posterior samples from the posterior distribution of the parameters of a Bayesian model. NIMBLE is flexible and allows the user to specify the sampling algorithms that should be used when undertaking posterior sampling. The above software packages with the exception of Stan, allow the user to sample from both discrete and continuous conditional distributions. Stan in contrast only allows posterior sampling from continuous random variables which are undertaken using the No-U-turn Hamiltonian Monte Carlo sampler (Hoffman and Gelman 2014).

Below we develop Gibbs sampling algorithms that can be used to fit MSO models. The Gibbs sampler is a MCMC method that is used to obtain samples from the posterior distribution of the parameters of a statistical model. The method can be used when all of the conditional posterior distributions of the parameters of a model are of a known form and sampling from the conditional posterior distributions are straightforward. The Gibbs sampler is known to exhibit slow mixing in certain cases, although this shortcoming is often overcome by sampling groups of parameters in a block which improves mixing (Roberts and Sahu 1997). In a single-season (single-species) nonspatial and spatial occupancy modelling context, Clark and Altwegg (2019), show that special-purpose Gibbs samplers can produce posterior chains that mix faster and have larger expected sampling rates (Holmes and Held 2006) than those obtained using JAGS and Stan. These results suggest that such algorithms could potentially lead to significant reductions in the run-times of MSO models. Doser et al. (2021) extended Clark and Altwegg (2019) and developed algorithms to fit various Bayesian occupancy models. As one of their developments, they assume that the spatial effects in the occupancy process of a single-season, single-species occupancy model can be modelled through a Gaussian process prior distribution with a Matérn covariance function. They found that for data sets with more than 1000 sites, their algorithm was prohibitively slow, in which case they used a nearest-neighbourhood Gaussian process prior distribution (Datta et al. 2016) to significantly reduce run-times when analysing large detection-nondetection data sets.

Dorazio and Rodriguez (2012) and Johnson et al. (2013) developed Gibbs algorithms to obtain posterior samples for the parameters of a nonspatial and spatial single-species occupancy model, respectively. Both approaches assume that detection and occupancy processes are modelled using probit link functions, which enables the use of data augmentation (Tanner and Wong 1987) to obtain closed form expressions of the conditional posterior distributions of the parameters of an occupancy model. Below we develop Gibbs sampling algorithms to undertake MSO models when species richness is known, and thereafter we describe how the algorithm would change when species richness is unknown. Initially, we assume that the regression effects of the detection and occupancy processes are modelled using probit link functions and after that we extend the analysis to the use of logistic link functions. We do so since we observe that, in certain model formulations, the use of probit link functions does not lead to a simple Gibbs sampling algorithm when sampling from the posterior distribution of the parameters of an MSO model. In these cases, MH steps, which require the monitoring of various additional tuning parameters, could be used. Another reason for using the logistic link function is that it allows one to more easily interpret the regression effects of the model compared to the probit model. Take note that below we do not discuss joint species distribution models (Pollock et al. 2014; Tikhonov et al. 2020) and restrict the focus to occupancy modelling. A detailed review of MSO and its use can be found in Devarajan et al. (2020).

The paper commences by discussing the Bayesian formulation of an MSO model when species richness is assumed to be known in a geographical region, while in Sect. 3, we develop a Gibbs sampler for the MSO model when species richness is unknown. In both cases, it is assumed that the detection and occupancy processes are modelled using probit link functions. In Sect. 4, we extend the previous work and consider the case when the detection and occupancy processes are modelled using logistic functions. We describe how the Gibbs algorithms are constructed and briefly comment on how MSO models can be applied to different taxa or species groups. Before concluding, we re-analyse the data collected from a camera-trapping study undertaken by Drouilly et al. (2018) in order to highlight the use of the methods developed as well.

2 Known species richness: probit link function

The following formulation of the MSO model is similar to Broms et al. (2016).Footnote 1 It is assumed that each site (J in total) can be surveyed numerous times, potentially with unequal survey effort, and that \(n_s\) species have been observed in total. The observed data is stored in a three-dimensional ragged-array \(\varvec{y}\) where \(y_{i,j,k}\) represents the detection-nondetection of species i, at site j during survey k (\(i=1, \ldots ,n_{s}; \;j=1, \ldots , J;\; k=1, \ldots , K_j\)). The random variable \(y_{i,j,k}=1\) if species i is observed at site j during visit k and 0 otherwise. The partially observed occupancy states are stored in a matrix \(\varvec{z}\) where \(z_{i,j}=1\) if species i occupies site j while \(z_{i,j}=0\) if species i does not occupy site j. We assume that the occupancy process can be modelled using site-specific covariates (\(\varvec{X}\)) while the detection process can be modelled using survey-specific covariates (\(\varvec{V}\)) where it is assumed that these covariates do not vary by species.

The model can now be specified using the following Bayesian hierarchical model,

$$\begin{aligned} z_{i,j}\vert \psi _{i,j}&\sim \text {Bernoulli}(\psi _{i,j}), \; i=1, \ldots ,n_s; \;\; j=1,\ldots ,J, \\ y_{i,j,k}\vert z_{i,j},p_{i,j,k}&\sim \text {Bernoulli}(z_{i,j}p_{i,j,k}), k=1, \ldots , K_j, \end{aligned}$$

where occupancy and detection probabilities are modelled using probit link functions, \(\psi _{i,j} = \Phi (\varvec{x}_j^T \varvec{\beta }_i)\) and \(p_{i,j,k} = \Phi (\varvec{v}_{j,k}^T \varvec{\alpha }_i)\; \forall (i,j,k)\). The Bayesian formulation is completed by specifying the prior distributions used. The regression effects of the detection and occupancy processes are modelled using multivariate Gaussian distributions and we specifically assume that

$$\begin{aligned} \varvec{\alpha }_i \vert \varvec{\mu }_{\varvec{\alpha }},\sigma ^2_{\varvec{\alpha }}&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\alpha }}, \sigma ^2_{\varvec{\alpha }} \varvec{I}_{n_d})\; \text {with}\; \varvec{\mu }_{\varvec{\alpha }} \sim \mathcal {N}(\varvec{0}, a^2 \varvec{I}_{n_d}) \; \text {and}\\ \varvec{\beta }_i \vert \varvec{\mu }_{\varvec{\beta }},\sigma ^2_{\varvec{\beta }}&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }}, \sigma ^2_{\varvec{\beta }} \varvec{I}_{n_o}) \; \text {with}\; \varvec{\mu }_{\varvec{\beta }} \sim \mathcal {N}(\varvec{0}, b^2 \varvec{I}_{n_o}). \end{aligned}$$

The coefficient vectors \(\varvec{\mu }_{\varvec{\alpha }}\) and \(\varvec{\mu }_{\varvec{\beta }}\) represent the community-level effects associated with the detection and occupancy processes respectively, and are each assigned a common Gaussian hyper-prior distribution. By doing so, one allows the regression effects to share information since often the detection and occupancy probabilities are not estimable for rare animal species due to lack of detections (Tobler et al. 2015).

Finally, \(\sigma _{\varvec{\alpha }}\) and \(\sigma _{\varvec{\beta }}\) are modelled using a half-Cauchy distribution (Gelman 2006) and can be specified hierarchically as

$$\begin{aligned} \sigma _{\varvec{\alpha }}^2\vert \nabla _{\varvec{\alpha }} \sim \text {IG}\left( 0.5, \frac{1}{\nabla _{\varvec{\alpha }}}\right) ,\;&\nabla _{\varvec{\alpha }} \sim \text {IG}\left( 0.5, \frac{1}{A^2} \right) \; \text {and},\end{aligned}$$
(1)
$$\begin{aligned} \sigma _{\varvec{\beta }}^2\vert \nabla _{\varvec{\beta }} \sim \text {IG}\left( 0.5, \frac{1}{\nabla _{\varvec{\beta }}}\right) ,\;&\nabla _{\varvec{\beta }} \sim \text {IG}\left( 0.5, \frac{1}{B^2} \right) , \end{aligned}$$
(2)

where A and B are known hyper-parameters.

Posterior samples can be obtained from the above MSO model by using a MH algorithm, although we instead derive a Gibbs algorithm that uses the data augmentation scheme developed by Albert and Chib (1993) to represent a probit link function by means of latent Gaussian random variables. The full conditional distributions needed to undertake a Gibbs sampler are all of a familiar form and are shown in the Supplementary Information (refer to Appendix 1.A.1).Footnote 2 In Supplementary Appendix 1.A.2 we also show how similar conditional posterior distributions can be obtained if site-specific covariates are available but no survey-specific covariates are available for the detection process. In this case, site-specific covariates are used as proxies for the survey-specific covariates.

3 Unknown species richness: probit link function

Dorazio and Royle (2005) developed a model that could be used to estimate the species richness (denoted as N) in a geographical region. They assume that there exists a super-population of species (denoted as S) that consists of the observed species, \(n_s\), as well as additional unseen species, \(S-n_s\). Recall that in Sect. 2, the survey occasion was explicitly included in the model formulation. Here all information relating to survey occasions are grouped together such that the observed data \(\varvec{y}_{n_s}\) consists of a \(n_s \times J\) matrixFootnote 3 of the total number of detections for each species at each location (from \(K_j\) surveys, \(j=1, \ldots , J\)). Since N is unknown, an \((S-n_s) \times J\) matrix of zeros are introduced, which represents the data associated with the unobserved species.

A latent indicator variable \(w_i\) is introduced, which takes on the value 1 if species i in the super-population occurs in the geographical region under investigation and 0 otherwise. From the above discussion, \(w_i=1\; \text {for} \; i=1, \ldots ,n_s\) and the species richness in a geographical region is \(N=\sum _{i=1}^S w_i\). The latent variable \(z_{i,j}\) takes on the value 1 if species i occupies location j and 0 otherwise. The \(w_i\) indicator is modelled using a Bernoulli distribution with success parameter \(\Omega\) while \(z_{i,j}\vert w_i\) is modelled as a Bernoulli random variable with success probability, \(w_i \psi _{i,j}\). The detection process (conditional on \(w_i=1, z_{i,j}=1\)) is modelled using a binomial distribution such that

$$\begin{aligned} p(y_{i,j}\vert z_{i,j}=1, p_{i,j}, w_i=1)&= {{K_j}\atopwithdelims (){y_{i,j}}} p_{i,j}^{y_{i,j}} (1-p_{i,j})^{K_j - y_{i,j}}. \end{aligned}$$
(3)

Using the above description it is clear that the model can be formulated hierarchically as

$$\begin{aligned} w_i&\sim \text {Bernoulli}(\Omega ),\; \forall i=1,\ldots ,S,\\ z_{i,j}\vert w_i, \psi _{i,j}&\sim \text {Bernoulli}(w_i\psi _{i,j}), \; \forall i=1, \ldots ,S; \;\; \forall j=1,\ldots ,J, \\ y_{i,j}\vert z_{i,j},w_i, p_{i,j}&\sim \text {Binomial}(K_j, \; w_i z_{i,j}p_{i,j}), \\ p_{i,j}&= \Phi (\varvec{v}_{j}^T \varvec{\alpha }_i) \; \text {with}\; \varvec{\alpha }_i \sim \mathcal {N}(\varvec{\mu }_{\varvec{\alpha }}, \sigma ^2_{\varvec{\alpha }} \varvec{I}_{n_d}),\\ \psi _{i,j}&= \Phi (\varvec{x}_j^T \varvec{\beta }_i) \; \text {with}\; \varvec{\beta }_i \sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }}, \sigma ^2_{\varvec{\beta }} \varvec{I}_{n_o}), \end{aligned}$$

where \(\varvec{v}_j\) and \(\varvec{x}_j\) variables are site-specific covariates used to model the detection and occupancy processes respectively.

Since the detection process is a binomial distribution, the use of the data augmentation method developed previously to sample from the detection regression effects is more cumbersome to use since the number of latent variables required to perform the data augmentation step increases from 1 to \(K_j\) for each i. The remaining parameters however have simple conditional posterior distributions if we use data augmentation to obtain the posterior distribution of the occupancy regression coefficients. A derivation of the resulting conditional posterior distributions can be found in Supplementary Appendix 1.B.

4 The use of logistic link functions in multi-species occupancy models

In Supplementary Appendices 1.C and 1.D we show that if logistic link functions are used to model the regression effects of the detection and occupancy processes, a Gibbs sampling algorithm with closed-form conditional posterior distributions for all parameters of the MSO models in Sects. 2 and 3, can be derived. A Gibbs sampler is preferable to the use of a MH algorithm since the former does not require any ‘tuning’ when undertaking the MCMC sampling.

As an example, consider the MSO model developed in Sect. 3 but assume that the regression effects for detection and occupancy are modelled as

$$\begin{aligned} \text {logit}(p_{i,j})&= \varvec{v}_{j}^T \varvec{\alpha }_i \, \text {where}\, \varvec{\alpha}_i \sim \mathcal {N}\bigg (\varvec{\mu }_{\varvec{\alpha }}, \sigma ^2_{\varvec{\alpha }} \varvec{I}_{n_d}\bigg ),\\ \text {logit}(\psi _{i,j})&= \varvec{x}_j^T \varvec{\beta }_i \, \text {where}\, \varvec{\beta}_i \sim \mathcal {N}\bigg (\varvec{\mu }_{\varvec{\beta }}, \sigma ^2_{\varvec{\beta }} \varvec{I}_{n_o}\bigg ), \end{aligned}$$

and both \(\varvec{\mu }_{\varvec{\alpha }}\) and \(\varvec{\mu }_{\varvec{\beta }}\) are modelled hierarchically as done previously. The conditional posterior distributions of most of the parameters remain the same as those defined in Sect. 3 although the conditional posterior distributions of \(\varvec{\alpha }_i\vert .\) and \(\varvec{\beta }_i\vert .\) are slightly amended as detailed below. When \(w_i=0\), \(\varvec{\beta }_i\vert . \sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }}, \sigma _{\varvec{\beta }}^2 \varvec{I}_{n_o})\), while when \(w_i=1\), posterior samples of \(\varvec{\beta }_i\) can be obtained by sampling from a Pólya-Gamma distribution and Gaussian distribution in turn. Specifically,

$$\begin{aligned} \varvec{\omega }_{i,j}^{(\varvec{\beta })}\vert \varvec{\beta }, w_i=1,.&\sim \text {PG}(1, \varvec{x}_j^T \varvec{\beta }_i), \, \forall j=1, \ldots , J\, \text {and}\\ \varvec{\beta }_i\vert \varvec{y}, \varvec{\omega }_i^{(\varvec{\beta })}, w_i=1,.&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\beta }_i}, \, \varvec{\Sigma }_{\varvec{\beta }_i}), \end{aligned}$$

where the ‘PG’notation denotes a Pólya-Gamma random variable, \(\varvec{\omega }_i^{(\varvec{\beta })}= \left[ \omega _{i,1}^{(\varvec{\beta })}, \ldots , \omega _{i,J}^{(\varvec{\beta })}\right] ^T\), \(\tau _{\varvec{\beta }}=\frac{1}{\sigma ^2_{\pmb \beta }}\) and

$$\begin{aligned} \varvec{\Sigma }_{\varvec{\beta }_i}&= \left( \tau _{\varvec{\beta }} \varvec{I}_{n_o} + \varvec{X}^T \varvec{D}_{\varvec{\beta }} \varvec{X}\right) ^{-1}, \\ \varvec{\mu }_{\varvec{\beta }_i}&= \varvec{\Sigma }_{\varvec{\beta }_i} \left( \varvec{X}^T \varvec{\Theta }_{\varvec{\beta }} + \tau _{\varvec{\beta }} \varvec{\mu }_{\varvec{\beta }}\right) . \end{aligned}$$

The matrices \(\varvec{D}_{\varvec{\beta }}\) and \(\varvec{\Theta }_{\varvec{\beta }}\) are defined as, \(\varvec{D}_{\varvec{\beta }}= \text {diagonal}\left[ \omega _{i,1}^{(\varvec{\beta })}, \ldots , \omega _{i,J}^{(\varvec{\beta })}\right]\) and \(\varvec{\Theta }_{\varvec{\beta }} = \left[ z_{i,1}-1/2, \ldots , z_{i,J}-1/2\right] ^T\). The conditional posterior distribution of \(\varvec{\alpha }_i\vert .\) can be obtained similarly such that

$$\begin{aligned} \varvec{\omega }_{i,j}^{(\varvec{\alpha })}\vert \varvec{\alpha }, w_i=1,.&\sim \text {PG}(K_j, \varvec{x}_j^T \varvec{\alpha }_i), \, \forall j=1, \ldots , j^*,\\ \varvec{\alpha }_i\vert \varvec{y}, \varvec{\omega }_i^{(\varvec{\alpha })} , w_i=1,.&\sim \mathcal {N}(\varvec{\mu }_{\varvec{\alpha }_i}, \, \varvec{\Sigma }_{\varvec{\alpha }_i}), \end{aligned}$$

where \(\varvec{\omega }_i^{(\varvec{\alpha })}= \left[ \omega _{i,1}^{(\varvec{\alpha })}, \ldots , \omega _{i,j^*}^{(\varvec{\alpha })}\right] ^T\), \(\tau _{\varvec{\alpha }}=\frac{1}{\sigma ^2_{\pmb \alpha }}\) and

$$\begin{aligned} \varvec{\Sigma }_{\varvec{\alpha }_i}&= \left( \tau _{\varvec{\alpha }} \varvec{I}_{n_d} + \tilde{\varvec{V}}^T \varvec{D}_{\varvec{\alpha }} \tilde{\varvec{V}}\right) ^{-1} , \\ \varvec{\mu }_{\varvec{\alpha }_i}&= \varvec{\Sigma }_{\varvec{\alpha }_i} \left( \tilde{\varvec{V}}^T \varvec{\Theta }_{\varvec{\alpha }} + \tau _{\varvec{\alpha }}\varvec{\mu }_{\varvec{\alpha }}\right) , \end{aligned}$$

where \(\tilde{\varvec{V}}\) is defined as the elements of \(\varvec{V}\) associated with \(z_{i,j}=1\) while the matrices \(\varvec{D}_{\varvec{\alpha }}\) and \(\varvec{\Theta }_{\varvec{\alpha }}\) are defined as, \(\varvec{D}_{\varvec{\alpha }}= \text {diagonal}\left[ \omega _{i,1}^{(\varvec{\alpha })}, \ldots , \omega _{i,j^*}^{(\varvec{\alpha })}\right]\) and \(\varvec{\Theta }_{\varvec{\alpha }} = \left[ y_{i,1}-K_1/2, \ldots , y_{i,j^*}-K_{j^*}/2\right] ^T\) for an appropriate value of \(j^*\). Complete derivations of the conditional posterior distributions of the MSO model can be found in Supplementary Appendix 1.C while Supplementary Appendix 1.D contains the required conditional distributions when species richness is assumed known. In the above formulations of the MSO model, it is assumed that both the detection and occupancy regression effects are modelled as realisations from a common Gaussian distribution. These models can easily be extended by adding one additional level of complexity and assuming that the regression effects of different groups (e.g. bird guilds, taxa or species groups) are modelled as realisations from different Gaussian distributions. The required algorithm is included in Supplementary Appendix 1.D.2. The above model formulations can easily be extended to account for a Bernoulli detection process or the inclusion of spatial random effects in either the detection or occupancy processes.

5 Application

As an application of the methods developed, we re-analyse the data collected from a camera-trapping study undertaken by Drouilly et al. (2018). The Anysberg Nature Reserve (33\(^{\circ }\) 31\(^\prime\) S, 20\(^{\circ }\) 37\(^\prime\) E) in South Africa (referred to as Anysberg) is the study area. The camera trapping design consisted of 156 camera trap sites (deployed from the end of September 2013 to May 2014) each placed at a location within a 2 km grid design (Kinnaird and O’Brien 2012). Data were collected using Bushnell Trophy CAM HG (Bushnell Outdoor Products, Overland Park, Kansas) camera traps and consisted of the processing of camera images obtained when cameras were triggered in the field. The study specifically focused on terrestrial vertebrates \(> 0.5\) kg in mass. A detailed explanation of camera trap survey design can be found in Sect. 2.2 of Drouilly et al. (2018). A list of the species observed at least once at Anysberg during the course of the study can be found in Supplementary Appendix 1.E.

Since the number of detections obtained daily was low, it was decided to define a sampling occasion as a 6 day period. From the captured images an observation matrix was constructed which consisted of the observed number of times species i (for \(i=1, \ldots , n_s=35\)) was detected at site j (\(j=1, \ldots , J=156\)) during \(K_j\) sampling occasions. The observed data is stored as \(\left[ y_{i,j}\right]\).

Below we use the observed species at Anysberg and relate the occupancy and detection processes to explanatory variables. We assume that species richness is known and fit the MSO model described in Sect. 2 (assuming logistic link functions) to the data. The occupancy process is modelled as a function of elevation (measured as metres above sea-level), the modified soil-adjusted index (MSAVI2) (Qi et al. 1994) and a PreyIndex.Footnote 4 MSAVI2 is a vegetation index similar to the normalised difference vegetation index. MSAVI2 was converted into a binary indicator variable (named MSAVI2ind) such that MSAVI2ind equals 1 if the standardised variable MSAVI2 is positive. The detection process is modelled as a function of a binary indicator variable, trail and a habitat factor variable. The trail variable equals 1 when a camera is placed on a trail and 0 otherwise. The habitat factor variable has three levels, namely, plain, river and mountain, where the mountain level is used as the baseline.

We fit all model combinations (32 in total, see Supplementary Appendix 1.F) and assume that all models include an intercept in the detection and occupancy process. We further assume that we are in a \(\mathcal {M}\)-complete setting (Bernardo and Smith 1994) and acknowledge that other variables might well be the true drivers of both the occupancy and detection process but choose to use the ones listed above. All MCMC sampling was undertaken using our code. For all models, three chains of 100,000 draws were used with a burn-in proportion of one-third. The following hyper-parameters were used: \(a^2=2.25^2=b^2\) and \(A^2=2.25^2=B^2\). MSO models produce significant amounts of outputs and are computationally expensive. Because of this, we thinned the resulting MCMC chains using a factor of ten. Various convergence testsFootnote 5 were undertaken to assess that the MCMC chains had converged (Smith 2007).

Exploratory (with-in sample) model selection was initially undertaken by first considering the WAIC (Watanabe 2010) and the approximate leave-one-out cross-validation scores (LOO) using Pareto smoothed importance sampling (Vehtari et al. 2017) for each of the candidate models. In order to assess the variability of the model selection statistics, ten independent MCMC runs were undertaken, and two goodness-of-fit statistics, namely the Bayesian p-value (Meng 1994) and the sampled Bayesian p-value (Zhang 2014) were also calculated. Gosselin (2011) shows that, asymptotically, the distribution of the sampled Bayesian p-value is uniformly distributed over the interval [0, 1] which suggests that misspecified models will not have uniformly distributed sample Bayesian p-values. The Bayesian posterior probability of each of the models were calculated using the version of the Reversible jump MCMC (RJMCMC) algorithm developed by Barker and Link (2013), which were then used to undertake Bayesian model averaging (Hoeting et al. 1999). We assume a priori that each model is equally likely. Refer to Supplementary Appendix 1.G for more details regarding the above mentioned model selection methods.

5.1 Application results

Both model selection measures require the calculation of the variance of the logarithm of the integrated likelihood evaluated using the posterior samples at each MCMC iteration. Based on ten independent MCMC runs, we found that both model selection measures were not accurately estimated since the previous mentioned variance calculation was often greater than 0.4 thus rendering both model selection methods unreliable (Vehtari et al. 2017).

Fig. 1
figure 1

The sampled Bayesian p-values of each model based on 10 independent replications. The highlighted boxplots indicate that the assumption that the distribution of the sampled Bayesian p-values are uniformly distributed over the interval [0, 1] cannot be rejected for these models when testing at a 30% significance level. The model associated with the ‘Model number’ can be found in Supplementary Appendix 1.F

The variability of the Bayesian p-values was of an order of magnitude smaller than those observed for the sampled Bayesian p-values (see Supplementary Appendix 1.H). Anderson-Darling tests (Anderson and Darling 1954) using the sampled Bayesian p-values obtained from ten independent MCMC runs, indicating that the assumption that the sampled Bayesian p-values are uniformly distributed over the interval [0, 1] cannot be rejected for fifteen of the thirty two models (assuming a 30% significance level).Footnote 6 These models have been highlighted in grey in Fig. 1, which displays the boxplots of the sampled Bayesian p-values based on ten independent MCMC runs. The models and the p-values of the Anderson–Darling tests are tabulated in Table 1. Given the small sample size (of ten), these tests tentatively suggest that models that only include an intercept term in the occupancy process are misspecified. These results also suggest that none of the variables considered always lead to misspecified models and thus all variables do appear to have some predictive ability of occupancy for the species considered in this data set.

Table 1 The list of models associated with non-significant Anderson–Darling tests when testing the null hypothesis that the distribution of the sampled Bayesian p-values are uniformly distributed over the range [0, 1]; the respective p-value of the test as well as the median sampled Bayesian p-value of the associated models

Conn et al. (2018) suggest that the Bayesian p-values should not be used to undertake model selection, but should rather be used to identify models that display clear signs of model misspecification. In our context, models 3, 4, 21, 25, 26, 29, and 30 appear to be misspecified since they either have a Bayesian p-value above 0.8 or are smaller than 0.2. These models either only include an intercept term in the detection or occupancy process; do not include the elevation variable in the occupancy process or only includes the trail variable in the detection process.

Table 2 The posterior model probabilities for the five best models obtained using RJMCMC

The resulting posterior model probabilities from the RJMCMC algorithm are displayed in Table 2. The models listed contain a combination of the elevation and PreyIndex variable in the occupancy process as well as a combination of the indicator variables trail, plain and river in the detection process. From these results, we observed that Model 6 has the largest posterior probability with a value of 0.85, Model 8 has the second largest posterior probability with a value of 0.14 while all other models have very low posterior mass associated with them.

From an examination of the population regression effects in Table 4 of Supplementary Appendix 1.I it can be seen that the mean population regression effects for plain and river in the detection process (of Models 8 and 24) both have 95% credibility intervals that contain zero. Similarly, the PreyIndex variable in Models 22 and 24 also have 95% credibility intervals that contain zero. Table 3 contains the Bayesian model averaged population regression effectsFootnote 7 as well as the posterior effect probabilities (the probability that a population regression effect is non-zero). The results clearly indicate that trail and elevation have non-zero regression effects in the detection and occupancy processes respectively while the remaining regression effects have a high probability of being zero. These results indicate that on average one is more likely to detect one of the animal species investigated off a trail than on it and that on average no significant differences in detectability of animal species are observed between the different habitat types. We found that the mean population regression effect for elevation is negative, indicating that on average higher occupancy probabilities are observed at lower elevation levels than at higher ones.

Table 3 The Bayesian model averaged community-level regression effects. (Posterior mean, Standard deviation as well as posterior effect probabilities.) Regression effects cannot be calculated if \(\text {Pr}(\varvec{\beta }\not = \varvec{0} \vert \text {Data})=0\) and is captured as ‘–’

Figure 2 displays caterpillar plots highlighting the mean community regression effects as well as the 95% credibility intervals for the species-specific regression effects of trail (a) and elevation (b) for the detection and occupancy processes respectively. Of the 35 species photographed, the occupancy probability was strongly related to elevation for 19 species. The results indicate that Smith’s rock hare (Pronolagus rupestris), klipspringers (Oreotragus oreotragus) and leopard’s (Panthera pardus) were more likely to occupy high lying regions of the reserve compared to many of the antelope species that preferred low-lying regions of the reserve. In general, the detection of most carnivore species were negatively related to trail while the detection of species such as the hyrax (Procavia capensis) and the duiker (Sylvicapra grimmia) were positively related to trail. Plots displaying the estimated mean occupancy probability of the different species can be found in Supplementary Appendix 1.J.

We found that the above results were not sensitive to the specification of the prior distribution. To this end, in Supplementary Appendix 1.K we display statistics pertaining to the community-level regression effects for Model 6 when run using three chains of 100,000 in length. A burn-in sample of one-third is specified. Thereafter the resulting chains were thinned by retaining every 10th observation. The different prior distributions were specified as follows: Prior 1: \(a^2=2.25^2=b^2\), \(A^2=2.25^2=B^2\); Prior 2: \(a^2=2.25^2=b^2\), \(A^2=2.25=B^2\); Prior 3: \(a^2=5^2=b^2\), \(A^2=5^2=B^2\); Prior 4: \(a^2=5^2=b^2\), \(A^2=5=B^2\).

Fig. 2
figure 2

Caterpillar plots highlighting the mean community regression effects as well as the 95% credibility intervals for the species-specific regression effects of trail (a) and elevation (b). The thick dashed lines indicate the 95% equal tail credibility interval for the mean community response to each variable

6 Discussion and conclusions

Multi-species occupancy models are important tools for statistical ecologists since they allow one to undertake community-level as well as species-specific inference by using detection-nondetection data of many species at different locations (Dorazio and Royle 2005; Dorazio et al. 2006; Broms et al. 2016). The information across species are pooled together, which allows one to obtain occupancy probabilities of rare species, which would otherwise not be possible due to the low detectability of these species.

Here, we develop Gibbs sampling algorithms when probit or logistic link functions are used to model the detection and occupancy processes respectively, and show how this can be done for a number of different versions of the MSO models. We also specifically show that the algorithms for MSO models (that use a logistic link function) are similar to those developed for single-season occupancy models as well as single-season spatial occupancy models where all algorithms use the work of Polson et al. (2013) and Clark and Altwegg (2019) to develop efficient Gibbs sampling algorithms.

In our application, we found that model selection methods such as WAIC (Watanabe 2010) and the approximate leave-one-out cross-validation scores using Pareto smoothed importance sampling (Vehtari et al. 2017) were unreliable for the data considered, a possible reason for this being the sparsity of data at several sites. Model selection was undertaken using Reversible jump MCMC (Barker and Link 2013), while the occupancy surfaces for all species considered were obtained using Bayesian model averaging (see Supplementary Appendix 1.J).

A single MSO model took approximately 12–13 min to fit compared to approximately 4 h when using Stan. All calculations were performed using R 4.0.4 (Core 2014) on a Windows 10 desktop with a AMD Ryzen 7 3700X 8-Core Processor, a clock speed of 3.59 GHz and 64 GB of RAM. On the same system, the RJMCMC algorithm took approximately one hour to complete. These run-times were not found to be prohibitively long, and thus we would encourage others to consider using the technique when undertaking model selection. The authors used their own code to undertake the RJMCMC algorithm, although the R package rjmcmc (Gelling et al. 2019) could also be used.