Comparison of WAIC and posterior predictive approaches for N-mixture models

Gaya, Heather E.; Ketz, Alison C.

doi:10.1038/s41598-024-66643-4

Comparison of WAIC and posterior predictive approaches for N-mixture models

Article
Open access
Published: 08 July 2024

Volume 14, article number 15743, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Comparison of WAIC and posterior predictive approaches for N-mixture models

Download PDF

Heather E. Gaya¹ &
Alison C. Ketz²

179 Accesses
3 Altmetric
Explore all metrics

Abstract

Hierarchical models are common for ecological analysis, but determining appropriate model selection methods remains an ongoing challenge. To confront this challenge, a suitable method is needed to evaluate and compare available candidate models. We compared performance of conditional WAIC, a joint-likelihood approach to WAIC (WAICj), and posterior-predictive loss for selecting between candidate N-mixture models. We tested these model selection criteria on simulated single-season N-mixture models, simulated multi-season N-mixture models with temporal auto-correlation, and three case studies of single-season N-mixture models based on eBird data. WAICj proved more accurate than the standard conditional formulation or posterior-predictive loss, even when models were temporally correlated, suggesting WAICj is a robust alternative to model selection for N-mixture models.

Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset

Article Open access 19 July 2022

A Bayesian Nonparametric Approach to Ecological Risk Assessment

Bayesian non-parametric detection heterogeneity in ecological models

Article 22 March 2021

Introduction

Model selection is a critical part of ecological inference^1,2,3, allowing scientists to weigh and ultimately choose between competing ecological hypotheses. In the absence of clear empirical evidence or infinite time to perform manipulative experiments, computational model fitting allows researchers to weigh the merits of alternative models^1,4,5. However, despite an increase in the use of Bayesian models in recent decades^6,7, rigorously tested methods of Bayesian model selection have remained relatively sparse⁵.

Uncertainty in ecological models comes from two main sources, ecological variability and sampling bias. First, the ecological process model reflects variation in the state of the system itself⁸. Further, uncertainty in the ecological process also reflects mechanistic uncertainty in the environmental variables for predicting system outcomes³. Second, observations of ecological data are never perfect and direct information on the system is rarely available⁹. Observations are often biased by imperfect detection, imprecision in measurement tools, and human error. To confront these challenges, a suitable method is needed to evaluate and compare available candidate models, while accounting for uncertainty.

One common method used for Bayesian model selection is the Watanabe-Akaike Information Criteria (WAIC)¹⁰, a generalized version of AIC. WAIC combines the estimated log predictive density of the data, with a penalty for the estimated number of parameters in the model to produce a value that can be compared between alternative models. This criteria is valid in both hierarchical and mixture models¹⁰ and has been shown to be an efficient alternative to cross-validation when data are uncorrelated^11,12. Mathematically, WAIC only utilizes the fit of the data to the observation process, suggesting this method may not be appropriate for discriminating between alternative models with different underlying state processes. Moreover, WAIC is only appropriate when processes do not have spatial or temporal auto-correlation⁵, which is an assumption rarely met for ecological systems.

We evaluate 3 prediction-based Bayesian model selection approaches that can be used to evaluate models when there is auto-correlation in the ecological process and imperfect detection in the observation process. We compare performance of conditional WAIC, posterior-predictive loss and a joint-likelihood approach to WAIC that accounts for the joint-likelihood of both the observation and state processes. We test these model selection criteria on simulated single-season N-mixture models, simulated multi-season N-mixture models with temporal auto-correlation, and three case studies of single-season N-mixture models for Bewick’s wren (Thryomanes bewickii), mourning dove (Zenaida macroura) and the house finch (Haemorhous mexicanus) based on eBird data¹³ collected in California.

Methods

WAIC

WAIC¹⁰ is calculated using the expected log predictive densities for all data points. Following¹¹, WAIC is defined as:

$$\begin{aligned} WAIC = -2\widehat{lpd} + 2\widehat{p}_{waic} \end{aligned}$$

(1)

where $\widehat{lpd}$ is the estimated log predictive density of the data $(\widehat{lpd} = \sum _{i = 1}^{n} \log (P(y_i | \hat{\theta _i})))$, and $\widehat{p}_{waic}$ is the estimated effective number of parameters $(\widehat{p_{waic}} = \sum _{i = 1}^{n} var_{post}(\log (P(y_i | \hat{\theta _i})))$. If the posterior distribution of $\theta $ is obtained directly from the likelihood of $y_i$ without any latent processes, and there is no sampling bias in the observation process, then WAIC is sufficient to identify the top model amongst all candidate models. However, in many ecological studies, sampling error is introduced because ecological processes are inherently difficult, if not impossible, to observe completely. Ecologists can address imperfect detection during model development and selection.

We demonstrate the impact of imperfect detection on model selection using a Binomial N-mixture model, where both abundance (N) and detection probability (p) vary with site i. Detection probability may also vary between surveys (j) due to weather or other temporal environmental conditions. The data, $y_{ij}$ are the observed abundance at site i during survey j of ($1 \cdots J$) surveys.

$$\begin{aligned} \mathrm {\log }(\mu _i)&= \beta _0 + \beta _1 x_{i1} + \beta _2 x_{i2} + \cdots \end{aligned}$$

(2)

$$\begin{aligned} N_i&\sim \textrm{Poisson}(\mu _i) \end{aligned}$$

(3)

$$\begin{aligned} \textrm{logit}(p_{ij})&= \alpha _0 + \alpha _1 x_{i1} + \cdots \end{aligned}$$

(4)

$$\begin{aligned} y_{ij}&\sim \textrm{Binomial}(N_i, p_{ij}). \end{aligned}$$

(5)

Traditionally, WAIC does not directly use the posterior of $N_i$, the state process. As $p_{ij}$ approaches 1, the log predictive density of the data approaches 0 and the state process has no effect on WAIC. Thus when $p_{ij}$ approaches 1, WAIC cannot distinguish differences in the fit of candidate models with different covariates on N.

$$\begin{aligned} P(y_{ij}| p_{ij}, N_{i})&= \left( {\begin{array}{c}N_i\\ y_{ij}\end{array}}\right) p_{ij}^{y_{ij}}(1-p_{ij})^{(N_i-y_{ij})} \end{aligned}$$

(6)

$$\begin{aligned} \lim _{p_{ij} \rightarrow 1} \log (P(y_{ij}| p_{ij}, N_{i}))&= \left( {\begin{array}{c}N_i\\ y_{ij}\end{array}}\right) 1^{y_{ij}}(0)^{(N_i-y_{ij})} = 0 \end{aligned}$$

(7)

Similarly, as $p_{ij}$ approaches 0, the log predictive density of the data also approaches 0:

$$\begin{aligned} \lim _{p_{ij} \rightarrow 0} \log (P(y_{ij}| p_{ij}, N_{i})) = \left( {\begin{array}{c}N_i\\ y_{ij}\end{array}}\right) 0^{y_{ij}}(1)^{(N_i-y_{ij})} = 0 \end{aligned}$$

(8)

In other words, if the parameter accounting for imperfect detection is near the boundaries of its distribution, the appropriate use of WAIC is limited.

Posterior predictive loss

We also consider a largely untested model selection method, posterior predictive loss^14,15, that is theoretically appropriate for model selection when there is auto-correlation in the ecological process. Posterior predictive loss ($D_{sel}$) rewards models that minimize the difference between expected $\mu _{ij}$ and observed data $y_{ij}$ (i.e., model fit) by minimizing the predictive variance ($\widehat{\sigma ^2}$). It is analogous to familiar information criteria, but does not penalize models based on the effective number of parameters⁵. Posterior predictive loss is defined as

$$\begin{aligned} D_{sel} = \sum _{i = 1}^{n} (\mu _{ij} - y_{ij}) + \sum _{i = 1}^{n} {\sigma ^2_{i}}, \end{aligned}$$

(9)

where $\mu _{ij} = E(y_{ij})$ and $\sigma ^2_{i}$ is the posterior variance of $y_{ij}$. Similar to WAIC, posterior predictive loss only considers the observation process in the likelihood of $y_{ij}$ and does not incorporate direct inference on latent processes. In other words, changes to the modeling structure for N (e.g. adding or removing covariates) are not directly captured by posterior predictive loss.

WAICj

We developed a joint-likelihood adjustment to WAIC based on the joint distribution of the observation and latent processes. We define our joint-likelihood approach to WAIC as:

$$\begin{aligned} WAICj =&-2\widehat{\lambda _{\text {j}}} + 2\widehat{p_{waicj}}, \end{aligned}$$

(10)

$$\begin{aligned} \lambda _{\text {j}} =&\sum _{i = 1}^{n} \log (P(y_i | N_i)) + \log (P(N_i | \mu _i)) \end{aligned}$$

(11)

$$\begin{aligned} p_{waicj} =&\sum _{i = 1}^{n} var_{post}(\log (P(y_i | N_i)) + \log (P(N_i | \mu _i))). \end{aligned}$$

(12)

where $\mu _i$ is the expected abundance at site i given $N_i \sim \textrm{Poisson}(\mu _i)$. Unlike the traditional formulation of WAIC, the log predictive density of the data does not approach 0 when $p_{ij}$ reaches the boundaries of its distribution.

$$\begin{aligned} \lim _{p_{ij} \rightarrow 1} \log (P(y_i | N_i)) + \log (P(N_i | \mu _i))&= \frac{{\mu _{i}}^N_{i} e^{-\mu _i}}{N_i!} \end{aligned}$$

(13)

Under this formulation, model selection accounts for the log likelihood in both the detection model and the underlying state process model. Thus, if abundance is thought to vary between sites in response to environmental covariates, model selection between candidate models with different covariates should be possible, even as detection probability approaches the boundaries of its distribution.

Simulation—single season N-mixture models

We simulated abundance in an N-mixture model framework for 300 data sets to compare these 3 model selection approaches including the standard conditional WAIC, posterior predictive loss, and our proposed joint-likelihood WAIC (WAICj). We considered our simulations to be analogous to a survey of bird populations. We generated a covariate representing the proportion of forested area at each site, and then simulated ‘bird’ abundance at 50 sites as a random draw from an inhomogenous Poisson based on the simulated covariate. The coefficient for the covariate ($\theta _i$) was randomly drawn for each simulation and then we generated the state process model ($N_i$). We simulated detection as four independent visits of each site assuming a single covariate effect on detection probability that would be analogous to wind speed.

We evaluated model selection performance under three different sampling intensities (15, 25 or 50 sites) and two detection scenarios (low detection and high detection). We specified the low detection scenario to have an average detection probability of 0.13 (0.06–0.24). The high detection probability scenario had an average detection probability of 0.84 (0.73–0.93). We analyzed generated data from each simulation using 4 candidate binomial N-mixture models (Table 1) and used all three formulations of model selection criteria to assess model performance. For each criteria, we calculated the weight of each candidate model following the standard formula for Akiake weights^1,16. We evaluated the performance of model selection criteria by calculating the proportion of simulations where the model that was selected was the true generating model.

Table 1 Candidate models. The asterisk denotes the model under which data was simulated. Periods represent intercept only models.

Full size table

We also assessed performance when additional site visits were added for each sampling intensity. Using the same sampling intensities and detection scenarios as described above, we simulated data from 4, 8, and 16 independent visits to each site. We evaluated the performance of model selection criteria by calculating the proportion of simulations where the model that was selected was the true generating model.

Simulation—multi-season N-mixture models

We also simulated abundance under a dynamic N-mixture to compare the 3 model selection approaches when temporal auto-correlation was present in abundance. The model for first year abundance ($N_{i1}$) matched the single-season scenarios, with abundance dependent on a simulated site-level covariate ($x_{i1}$) representing forest cover.

$$\begin{aligned} \mathrm {\log }(\mu _{i1})&= \beta _0 + \beta _1 x_{i1}, \end{aligned}$$

(14)

$$\begin{aligned} N_{i1}&\sim \textrm{Poisson}(\mu _{i1}) \end{aligned}$$

(15)

Following the first year, abundance at each site was simulated as a function of the previous year’s expected abundance and a site-specific growth rate, dependant on the site-level covariate and an effect of time period (t).

$$\begin{aligned} \mathrm {\log }(\psi _{it})&= \delta _0 + \delta _1 x_{it} + \delta _2 t \end{aligned}$$

(16)

$$\begin{aligned} \mu _{it}&= \psi _{it} \mu _{it-1} \end{aligned}$$

(17)

$$\begin{aligned} N_{it}&\sim \textrm{Poisson}(\mu _{it}) \end{aligned}$$

(18)

We simulated the multi-season model with two time trends, weak and strong. As with the single season N-mixture models, we evaluated model selection performance under two detection scenarios (low detection and high detection) and when 15, 25 or 50 sites were sampled with 4 independent visits per site. For each scenario we simulated 100 data sets, each with 5 years of data, for a total of 600 data sets. We used the same observation process and candidate models as in the single-season scenarios.

Case study—California eBird data

We applied N-mixture models to point count data for three species of birds in California, U. S., to demonstrate the effect of model selection outcomes. We used data from the global eBird database that consisted of semi-structured avian point count data collected by community scientists¹⁷. Data are reported as ‘checklists’ of species detections, with the location, time of survey, number of observers, length of survey, and number of detections of each species reported for each checklist. We selected 180 eBird ‘complete checklists’ recorded across 48 locations in southern California between April and July, 2019. We chose these dates to represent the breeding season, when male birds may be more likely to vocalize. Data were downloaded from eBird and formatted using the auk package in R^13,18 and code presented in Goldstein¹⁹.

We focused our analysis on three bird species with differing numbers of total raw counts—Bewick’s wren, mourning dove and the house finch. For each site, we considered 3 possible environmental covariates that could potentially affect abundance (elevation, proportion of forest cover and proportion of water) and 4 detection covariates (duration of survey, number of observers, proportion of forest cover and proportion of water). Elevation and landcover data for the state of California were downloaded from WorldClim²⁰ and LandFire GIS database’s Existing Vegetation Type layer²¹ respectively. For each species, we fit 16 candidate binomial N-mixture models and used the three model selection criteria to assess model performance.

All simulations and case study models were approximated using Markov-chain Monte Carlo using the nimble package²² in R version 4.1.3²³. We used 2 MCMC chains with 10,000 iterations and a thinning rate of 1. Chain convergence was assessed by the Gelman-Rubin statistic¹¹ and visual inspection of traceplots.

Results

When applied to single-season N-mixture models, the WAICj approach ranked the correct model as the top model more often than the other model selection methods (Table 2), and identified the correct top model in 74.2% of the simulations. Standard conditional WAIC correctly identified the top model in 59.3% of the simulations. Model selection using posterior predictive loss performed poorly under all scenarios and only identified the correct top model in 10.3% of simulations. There was no difference in performance between the standard WAIC and WAICj approaches for the low detection probability scenario. For the simulated single-season N-mixture models, increased sample size led to more accurate model selection for both the standard WAIC and WAICj approaches (Fig. 1).

Table 2 Proportion of simulations (n = 100) where the generating model was chosen as the top ranked model.

Full size table

When we increased the number of simulated visits per site, both the standard WAIC and WAICj tended to produce more accurate results as visits increased, regardless of the detection scenario (Fig. S1). Posterior predictive loss showed a similar pattern to WAIC when detection was close to 1, but showed a drop in performance with increased visits when detection was close to 0. Regardless of the number of site visits, WAICj was the most accurate of the three tested methods when identifying the generating model.

For the multi-season models with time trends, WAICj correctly identified the generating model 86–100% of time (Table 2). WAIC and WAICj weighted the generating model close to 1 when the generating model was identified as the top model (Fig. 2). The joint likelihood approach consistently performed better under high detection for both the weak time trend and strong time trend scenarios. Conditional WAIC identified the correct model in 45% of simulations with a weak time trend when detection was high, but identified the correct model in 98–99% of simulations when there was a strong time trend present. Posterior predictive loss correctly chose the generating model in 0–47% of the time-trend scenarios (Fig. 3).

Between April and July, 2019, 24 Bewick’s wren, 101 mourning dove and 350 house finch were detected across 180 eBird checklists at 48 locations. Across the three species, model selection results were inconsistent (Table 3). Bewick’s wren and mourning dove had conflicting results for the top model based on the three methods. For the house finch, the conditional and joint-likelihood approaches to model selection consistently supported the same top model. For all three species, posterior predictive loss indicated varying top models compared with the other model selection approaches.

Table 3 Model weight given to the top 3 candidate models for three bird species detected by eBird surveys in southern California from April to July, 2019.

Full size table

Despite differences in model ranking, there was little difference in the overall estimated abundance for Bewick’s wren or mourning dove based on the top model selected by each method (Fig. 4). However, the top model for house finch selected by both the conditional and joint-likelihood approaches to WAIC suggested a substantially higher abundance than the model selected by the posterior predictive loss approach. We did not evaluate goodness of fit for any of the models, as this was outside the scope of the study.

Discussion

Model selection is critical for discriminating between alternative ecological hypotheses^4,6. Hierarchical models are now ubiquitous in ecology, but determining appropriate model selection methods remains an ongoing challenge^2,5,24. WAIC methods are used analogously for hierarchical Bayesian models, much like AIC is applied in maximum likelihood analyses, especially when data are plentiful and detection probability is moderate^5,11. However, we found that the conditional calculation of WAIC was inconsistent under low data scenarios and when detection parameters were approaching the boundaries of their distributions. The joint likelihood formulation of WAIC (WAICj) proved more accurate than the standard conditional formulation, even when models were temporally correlated, suggesting WAICj is a robust alternative to model selection.

We expected increased sampling intensity (larger number of sites) to improve model selection accuracy, but the results were inconsistent across simulation scenarios and between selection methods. In both the single season and weak time trend scenario, posterior predictive loss had an inverse relationship between accuracy (correctly identifying the top model) and the number of sites in the dataset. This did not correspond to a lack of data, as we saw the same behavior when additional sites were added to our single season high detection scenario. Posterior predictive loss prioritizes the fit of the observed data to the observation model, as demonstrated by the increased selection of the incorrect process model (model 4) as sampling intensity increased. Thus, when two candidate models have similar posterior distributions according to the observation process, the penalty terms are also similar, leading to inaccurate conclusions²⁵. This behavior suggests that large sample size alone does not guarantee accuracy in model selection, particularly when fit of the process model is not directly incorporated in model selection calculations.

Despite improvements in model selection accuracy when compared to posterior predictive loss and the traditional formulation of WAIC, WAICj showed poor performance when evaluating single-season N-mixture models with a low number of sites. We suspect this poor performance stems in part from the N-mixture modeling structure. One drawback to N-mixture models is the lack of information contained in repeated, unmarked counts. When the number of sites visited or the number of independent visits to each site are low, model parameters may not be clearly identifiable^26,27. As Barker et al.²⁸ demonstrated using AIC, sparse data generated under an N-mixture model is practically indistinguishable from data generated under alternative models when detection probability is constant (or close to constant). Therefore, the appropriate use of WAICj is naturally limited to sample sizes that are sufficiently large to produce fully identifiable models.

When applied to eBird data, the three model selection approaches tended to select different top models. While conditional and joint likelihood approaches to WAIC always agreed on the observation process and had similar estimates for total abundance, posterior predictive loss consistently chose alternative candidate models. Notably, the top model for house finch selected by posterior predictive loss suggests approximately 4000 fewer individuals than the top models selected by WAIC and WAICj. In conjunction with the simulation results, this deviation likely reflects the unsuitability of posterior predictive loss for discriminating between alternative N-mixture models. Moreover, the inconsistencies between the top models selected by each selection criteria highlight the potential for poor model selection methods to lead to inaccurate inference on the ecological relationships between a species of interest and its environment.

Posterior predictive loss performed unreliably across all simulation scenarios, and we caution ecologists against using this metric for model selection. Previous research has suggested that posterior predictive loss can be an effective model selection method when data are observed without error¹⁵ and should be suitable for correlated data⁵. However, consistency of posterior predictive based criteria appears to be dependent on the model structure²⁵ and may perform poorly when data are only partially observed²⁹. While we cannot unilaterally conclude that posterior predictive loss is inappropriate for all model selection endeavors, we do not recommend this approach for discriminating between alternative N-mixture models.

Previous research has shown that WAIC is inappropriate for model selection for correlated data, especially for spatial models such as spatial capture-mark recapture^2,5. An underlying assumption in the calculation of WAIC is independence of the data given the parameters, an assumption that was violated in the multi-season simulations. The conditional calculation of WAIC performed inconsistently when a time-trend was present in the model, reflecting that the violations of the assumption of independence is problematic for this information criteria. However, we found that WAICj was a valid choice for model selection even when data were temporally correlated. Based on these results, we suggest that WAICj may be appropriate for both single season and multi-season N-mixture models, but suggest caution when applying this model selection criteria outside of an N-mixture modeling framework. Further research could address the utility of our joint WAIC approach developed here, for other hierarchical Bayesian models.

Model selection methods, even for relatively simple hierarchical models, requires more development to produce consistent and reliable results. WAICj produced more consistent results than conditional WAIC or posterior predictive loss, but all of the tested model selection criteria performed poorly in at least one simulation scenario. For many ecological studies, detection probability is often low⁹ and extensive sampling can be cost-prohibitive. Thus, while N-mixture models may offer a flexible solution for modeling count data, we suggest caution when performing model selection and urge ecologists to carefully consider the weaknesses of their chosen model selection method when evaluating ecological hypotheses.

Data availability

All scripts and data necessary to run the analysis are publicly available: https://zenodo.org/records/10680420

References

Burnham, K. P., Anderson, D. R. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach 2. ed., [4. printing] edn. (Springer, 2002).
Google Scholar
Dey, S., Delampady, M. & Gopalaswamy, A. M. Bayesian model selection for spatial capture-recapture models. Ecol. Evolut. 9, 11569–11583 (2019).
Article Google Scholar
Tredennick, A. T., Hooker, G., Ellner, S. P. & Adler, P. B. A practical guide to selecting models for exploration, inference, and prediction in ecology. Ecology 102, e03336 (2021).
Article PubMed Google Scholar
Hilborn, R. & Mangel, M. The ecological detective: Confronting models with data Nachdr. edn. No. 28 in Monographs in population biology (Princeton Univ.Press, Princeton, NJ, 1997).
Hooten, M. B. & Hobbs, N. T. A guide to Bayesian model selection for ecologists. Ecol. Monogr. 85, 3–28 (2015).
Article Google Scholar
Johnson, J. B. & Omland, K. S. Model selection in ecology and evolution. Trends Ecol. Evolut. 19, 101–108 (2004).
Article Google Scholar
Bon, J. J. et al. Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics. Philosophical Transactions of the Royal Society A (2023). Publisher: The Royal Society.
Cuddington, K. et al. Process-based models are required to manage ecological systems in a changing world. Ecosphere 4, art20 (2013).
Article Google Scholar
Kellner, K. F. & Swihart, R. K. Accounting for imperfect detection in ecology: A quantitative review. PLoS ONE 9, e111436 (2014).
Article ADS PubMed PubMed Central Google Scholar
Watanabe, S. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. J. Machine Learn. Res. 11, 3571–3594 (2010).
MathSciNet Google Scholar
Gelman, A. Bayesian data analysis Third edition edn. Chapman & Hall/CRC texts in statistical science (CRC Press, 2014).
Link, W. A., Sauer, J. R. & Niven, D. K. Model selection for the North American breeding bird survey. Ecol. Appl. 30, e02137 (2020).
Article PubMed Google Scholar
Fink, D. et al. eBird status and trends, data version: 2022 (2023).
Laud, P. W. & Ibrahim, J. G. Predictive model selection. J. R. Stat. Soc. Series B (Methodol.) 57, 247–262 (1995).
Article MathSciNet Google Scholar
Gelfand, A. E. & Ghosh, S. K. Model choice: A minimum posterior predictive loss approach. Biometrika 85, 1–11 (1998).
Article MathSciNet Google Scholar
Wagenmakers, E. & Farrell, S. AIC model selection using Akaike weights. Psychon. Bull. Rev. 11 (2004). Publisher: Psychon Bull Rev.
Sullivan, B. L. et al. eBird: A citizen-based bird observation network in the biological sciences. Biol. Conserv. 142, 2282–2292 (2009).
Article Google Scholar
Strimas-Mackey, M., Miller, E. & Hochachka, W. auk: eBird Data Extraction and Processing with AWK (2023). R package version 0.7.0.
Goldstein, B. & de Valpine, P. Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset. Sci. Rep. 12, 12276 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Fick, S. E. & Hijmans, R. J. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315 (2017).
Article Google Scholar
LANDFIRE. LANDFIRE 2.0.0 Existing Vegetation Type Layer (2016).
de Valpine, P. et al. Programming with models: Writing statistical algorithms for general model structures with nimble. J. Comput. Graph. Stat. 26, 403–413 (2017).
Article MathSciNet Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019).
Kéry, M. & Royle, J. A. Applied hierarchical modeling in ecology: Analysis of distribution, abundance and species richness in R and BUGS (Elsevier/AP, Academic Press is an imprint of Elsevier, 2016).
Google Scholar
Haaf, J. M., Klaassen, F. & Rouder, J. N. Bayes factor vs. Posterior-Predictive Model Assessment: Insights from Ordinal Constraints (2020).
Dennis, E. B., Morgan, B. J. & Ridout, M. S. Computational aspects of N-mixture models. Biometrics 71, 237–246. https://doi.org/10.1111/biom.12246 (2015).
Article MathSciNet PubMed Google Scholar
Madsen, L. & Royle, J. A. A review of N-mixture models. WIREs Comput. Stat. 15, e1625. https://doi.org/10.1002/wics.1625 (2023).
Article MathSciNet Google Scholar
Barker, R. J., Schofield, M. R., Link, W. A. & Sauer, J. R. On the reliability of N-mixture models for count data. Biometrics 74, 369–377. https://doi.org/10.1111/biom.12734 (2018).
Article MathSciNet PubMed Google Scholar
Daniels, M. J., Chatterjee, A. S. & Wang, C. Bayesian model selection for incomplete data using the posterior predictive distribution. Biometrics 68, 1055–1063 (2012).
Article MathSciNet PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank R. Chandler for his intellectual contributions and assistance with earlier versions of the manuscript. We thank 2 anonymous reviewers for their suggested improvements to the manuscript.

Author information

Authors and Affiliations

Warnell School of Forestry and Natural Resources, University of Georgia, Athens, GA, 30602, USA
Heather E. Gaya
Wisconsin Cooperative Research Unit, Department of Forest and Wildlife Ecology, University of Wisconsin, Madison, WI, 53706, USA
Alison C. Ketz

Authors

Heather E. Gaya
View author publications
You can also search for this author in PubMed Google Scholar
Alison C. Ketz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.G. conceived the simulations and conducted the analysis, with suggestions by A.K. H.G. and A.K. jointly wrote the manuscript.

Corresponding author

Correspondence to Heather E. Gaya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gaya, H.E., Ketz, A.C. Comparison of WAIC and posterior predictive approaches for N-mixture models. Sci Rep 14, 15743 (2024). https://doi.org/10.1038/s41598-024-66643-4

Download citation

Received: 19 February 2024
Accepted: 03 July 2024
Published: 08 July 2024
DOI: https://doi.org/10.1038/s41598-024-66643-4
Springer Nature Limited

Comparison of WAIC and posterior predictive approaches for N-mixture models

Abstract

Similar content being viewed by others

Comparing N-mixture models and GLMMs for relative abundance estimation in a citizen science dataset

A Bayesian Nonparametric Approach to Ecological Risk Assessment

Bayesian non-parametric detection heterogeneity in ecological models

Introduction

Methods