1 Introduction

Conservation decision-making relies on species distribution models (SDMs) to provide spatially explicit predictions of species occurrence and inferences regarding species-habitat associations with uncertainty. Static environmental variables, such as elevation, are readily available as interpolated spatial surfaces with grid-based values and frequently included as predictors in SDMs. However, adjusting inferences about species occurrence or abundance relative to potential exposure to detrimental factors (e.g., disease vectors) often requires quantification of stressors at unsampled locations. When the stressor is quantified on an areal unit and the species response is measured at a point location, the different spatial resolutions present as a change-of-support problem. The spatial misalignment requires an approach that imputes or predicts the stressor variable at new areas while properly propagating the uncertainty in those predictions into the species response model. Spatially misaligned regression modeling (Banerjee et al. 2015, pp. 206–212), although more common in human health exposure applications (Warren et al. 2012; Lee et al. 2015; Cameletti et al. 2019), has not been used to assess a wildlife population response to a known disease. Our work is motivated by White-Nose Syndrome (WNS) which has decimated North American hibernating bats (U.S. Fish and Wildlife Service 2012) and resulted in two species being proposed for federal protection under the Endangered Species Act (U.S. Fish and Wildlife Service 2022b, a).

White-Nose Syndrome is a wildlife disease characterized by cutaneous infection during hibernation caused by the cold-adapted fungus Pseudogymnoascus destructans (Pd; Gargas et al. 2009; Lorch et al. 2011; Frick et al. 2015). The disease was first detected in North America in New York, USA, in 2006 (Turner et al. 2011) and has since spread to 40 states in the USA and eight Canadian provinces (Frank et al. 2019; WNS Response Team 2023), resulting in millions of fatalities (U.S. Fish and Wildlife Service 2012) and population declines exceeding 90% in several of the most susceptible species (Cheng et al. 2021). The disease presents as white fungal hyphae and lesions on the muzzle, wings, and ears of afflicted bats (Gargas et al. 2009; Chaturvedi et al. 2010) and causes mortality through disruption of torpor patterns during hibernation, leading to premature depletion of fat reserves and subsequent death due to starvation (Reeder et al. 2012; Frank et al. 2019). Studies documenting the decline in overwintering bat populations associated with WNS are based on count-based cave surveys jointly conducted with sampling for Pd and WNS on individual bats (Cheng et al. 2021). In order to measure Pd or document evidence of WNS, bats need to be captured and handled or visually inspected. The challenge in the western USA is that most of the susceptible bat species do not hibernate in known caves, mines, or other locations that are accessible for winter survey (Blejwas et al. 2023). Consequently, locating and capturing western bat species during hibernation, when their fungal loads are greatest, is extremely challenging. As a result, obtaining a sufficient number of samples to inform disease presence/absence at a given location is non-trivial (U.S. Geological Survey 2022).

In coordination with the US Fish and Wildlife Service’s National White-Nose Syndrome Response Plan (U.S. Fish and Wildlife Service 2011), the US Geological Survey’s National Wildlife Health Center designed and implemented a national surveillance program in 2012 to detect and monitor the spread of Pd across the USA. The original continental Pd sampling design was informed by a dynamic spatial diffusion model that identified high-risk areas where Pd was predicted to have spread in a given season (U.S. Geological Survey 2019, 2022). The associated sampling design results in intensive data collection along the predicted westward front of Pd spread. In the winter of 2021, a year after Pd was detected in Montana, Montana Fish Wildlife, and Parks supplemented the continental design by creating a 36-cell lattice to guide surveillance data collection efforts to ensure statewide coverage (Fig. 1). Also in response to WNS, a separate and distinct collaborative monitoring program was created in 2015 to track the status and temporal trajectories of bat populations across North America, known as the North American Bat Monitoring Program (NABat; Loeb et al. 2015). The NABat plan defines a lattice of 10 km \(\times \) 10 km grid cells and an associated generalized random-tessellation stratified sampling design (GRTS, Stevens and Olsen 2003) to provide a spatially balanced master sample (Fig. 1). The NABat program employs passive acoustic recording units (ARUs) to measure relative bat activity during the pre-volant period (Loeb et al. 2015). A collective of Montana state agencies, federal agencies, non-government organizations, and tribal governments adopted the NABat design and have collaboratively conducted ARU-based surveys at 87 grid cells across the state in June and July annually since 2020 (Fig. 1). The best practices for integrating disease surveillance and population monitoring data at different spatial resolutions are not well understood, and previous attempts have struggled to appropriately propagate error and quantify uncertainty (Merkle et al. 2018). To our knowledge, this work is the first attempt to integrate data collected from a wildlife disease surveillance program and a population monitoring program through a joint spatial model to inform landscape conservation planning.

Previous data integration techniques in ecology constructed a joint likelihood composed of independent probability distributions for each response-type conditional on a common latent parameter(s), as in spatial data fusion or integrated SDM applications (Pacifici et al. 2017; Fletcher et al. 2019; Miller et al. 2019). For example, the latent species occurrence state of an areal unit can be jointly informed by capture and point-count surveys (Miller et al. 2019). In contrast, when the goal of inference is to assess how a disease impacts a population of interest, the parameter associated with the directional relationship between the exposure and response is the parameter of interest (e.g., a regression coefficient). Inference on how modeled quantities affect a response using regression tools is common in environmental health applications that investigate how exposure to a toxin or air pollutant affects a human population outcome (e.g., mortality or birth rates; Warren et al. 2012; Cameletti et al. 2019). If the exposure and response data are collected at different spatial resolutions, such as the Pd surveillance and acoustic monitoring of bats in Montana, techniques for spatially misaligned regression must be considered.

A common approach to misaligned regression modeling in the domain of human health is the so-called plug-in method (Lee and Shaddick 2010; Lee et al. 2015; Pannullo et al. 2016; Cameletti et al. 2019), in which the disease or exposure data are first modeled agnostic to the observed response. Then, the response data are regressed on a fixed one-number summary of exposure, such as the posterior mean or median. The “plug-in” technique presents a number of advantages: (1) It is easily accessible by practitioners and computationally efficient (Cameletti et al. 2019), (2) it accommodates varying degrees of missing data and spatial misalignment between the disease and response variables, and (3) it models the exposure independent from the response. In human health applications, the lack of connection between the exposure and response models in the “plug-in” method is considered beneficial. For example, Warren et al. (2012) noted that a joint model for air pollution and preterm birth rates is not reasonable because human birth outcomes are not expected to influence the quantity or distribution of airborne contaminates. In contrast, allowing bat activity to inform the distribution of Pd is desirable because bats could influence the spread of the fungus to environmentally suitable areas by dispersing Pd within a roost and transporting the fungus to new roosts (Lorch et al. 2011; Warnecke et al. 2012; Wibbelt et al. 2010). Additionally, the “plug-in” method does not propagate any of the uncertainty in estimation of exposure through to the response model, resulting in potentially biased and overly precise estimates of the association between the exposure and response (Cameletti et al. 2019).

Bayesian techniques for spatially misaligned regression use informative prior distributions to incorporate the uncertainty in predicting exposure at unsurveyed locations into the response model. First the exposure data are modeled, which in our case is evidence of Pd or WNS at one of the 36 cells in Montana (Fig. 1). Then, the exposure variable is specified as a latent predictor within the response model with prior distributions informed by the posterior distribution from the exposure model (Warren et al. 2012; Powell and Lee 2013; Lee et al. 2016; Cameletti et al. 2019). For example, Cameletti et al. (2019) used estimated posterior distributions of air pollution exposure, represented as annual mean \(\text {NO}_2\) concentrations, as prior distributions on the latent predictor variables in their response model describing hospitalizations. Similarly, Warren et al. (2012) used estimated posterior predictive distributions of climate variables as prior distributions when modeling air pollution. Related methods rely on fitting the response model multiple times, each time selecting a posterior sample from the disease model as a fixed covariate, then combining parameter estimates from the multiple fitted response models. For example, Cameletti et al. (2019) introduce the “feed-forward” approach, in which J posterior samples are selected from the exposure model and the response model is fit for each posterior sample. Similarly, Zhang et al. (2022) used multiple posterior samples from a COVID-19 exposure model as inputs in a mortality response model. Relying on prior distributions for latent parameters in the response model informed by the posterior distributions from the exposure model, hereafter the “prior” method, allows for propagation of the uncertainty in the estimation of the exposure model through to the response model.

When assessing the impact of Pd occurrence on Western bat populations, we propose that a better solution is to specify a single integrated modeling framework that completely captures the shared information between the Pd surveillance data and acoustic activity data. We compare the “plug-in” and “prior” methods to a joint model using a simulation study and empirical data in the state of Montana (Fig. 1). Specifically, our interest is whether a single integrated modeling framework that fully propagates the uncertainty in estimation of the wildlife disease model through to the ecological count model results in more accurate and precise estimates of disease occurrence and appropriate uncertainty in estimation of disease impacts on the ecological count response. In the following sections, we describe development of a joint model that accommodates non-Gaussian responses, imperfect detection, and spatial misalignment. Then, we describe our joint model extension that addresses the nuances of the data available in Montana, including multiple years of data collection and within-season temporal correlation among acoustic-based survey events. We provide Stan code (Stan Development Team 2023b) and documentation (Stratton et al. 2023), such that our method is accessible and adaptable for other ecological disease applications where leveraging joint information shared by the exposure and the response could be advantageous.

Fig. 1
figure 1

Data from both monitoring programs in 2021. The small grid cells represent sample locations from the NABat acoustic monitoring program (Loeb et al. 2015) and are colored by the log of average count of recordings per site-night within the cell. The large polygons represent the 36-cell lattice developed by MFWP and are colored by the proportion of survey events positive for Pseudogymnoascus destructans (Pd) within the polygon. Most Pd surveillance polygons received only a single sampling event, though some received as many as five. White fill represents unsampled grid cells and polygons

2 Methods

2.1 Bat Population Monitoring Using Acoustic-Based Surveys

Between 2020 and 2022, Montana Fish, Wildlife, and Parks (MFWP), the Montana Natural Heritage Program (MTNHP), and other partner agencies conducted acoustic monitoring of bats consistent with guidance from the North American Bat Monitoring Program (NABat, Loeb et al. 2015). The survey design defines the spatial domain of interest as the state of Montana, and an associated subset from the NABat master sample provides a spatially balanced sample from the NABat grid (Fig. 1). Between 2020 and 2022, 87 grid cells were sampled each year. Within each grid cell, an average of four ARUs (detectors) was deployed for an average of four nights each; detectors were placed sufficiently far apart to minimize spatial dependence among recorded calls from separate detectors (Loeb et al. 2015). Each detector recorded echolocating bats between sunset and sunrise nightly, resulting in an average of 16 detector-night combinations (hereafter, site-nights) per grid cell.

Acoustic recordings were assigned a species label using the SonoBat acoustic classification software (Szewczak 2023); proposed species assignments and call sequence attributes were derived from one of three regional classifiers to account for variations in community composition across the state. In order to account for the potential false-positive and false-negative detections that can result from using automated classification software (Chambert et al. 2018), auto-classified species labels are often manually reviewed by experts prior to analysis (Banner et al. 2018; Reichert et al. 2018). For the acoustic data collected in Montana, manually verified species labels were not available for all call sequences as only a subset of the calls collected were reviewed to confirm species presence at each detector location and many call sequences were not identified to species by the classifier. To account for the lack of robust species identifications, recordings were classified by whether the frequency fell within the range of WNS-susceptible species, and analyses were conducted on the recordings attributed to WNS-susceptible species in an attempt to remove false-positive detections and restrict inference on the impact of WNS to only WNS-susceptible species. In Montana, the majority of WNS-susceptible species are of the genus Myotis (Bachen et al. 2018). Attributes of call sequences from individuals of known species, archived within the MTNHP’s bat call library, were used to determine a mean characteristic frequency threshold that separated Myotis bats from bats not susceptible to WNS. The frequency threshold, 34 kHz, was then used to restrict to recordings from only WNS-susceptible bats prior to analysis.

2.2 Pseudogymnoascus destructans (Pd) Surveillance Data

In 2019, MFWP began collaborating with the National Wildlife Health Center to implement Pd sampling (U.S. Geological Survey 2019, 2022), and Pd was first detected in the eastern part of the state during the winter of 2020–2021. Beginning in 2021, MFWP broadened its surveillance to include locations across the entire state in order to tie regional Pd status to summertime trends in bat activity informed by acoustic monitoring. To facilitate statewide monitoring within the constraint of finite resources, a 36-cell lattice of large rectangular polygons (hereafter, “Pd polygons”) was superimposed over the state, and the state aimed to conduct at least one sampling event within each polygon each year (Fig. 1). However, 12, 22, and 29 polygons were sampled in 2020, 2021, and 2022, respectively. In Table 1, we categorize each of the 3983 acoustic monitoring grid cells in terms of the degree of overlap between the acoustic and Pd data sources.

Within each polygon, local biologist expertise was used to identify hibernacula, spring emergence mist-net sites, or maternity roost sites for sampling; attempts were made to evenly distribute the survey type, including hibernacula surveys, live animal trapping, or pooled guano and environmental sampling, across the state. Hibernacula surveys involved swabbing hibernating bats, cave substrates, or collecting soil and guano. Live animal trapping involved early season mist-netting or trapping bats emerging from bat boxes between April and June. Pooled guano surveys were conducted by collecting fresh guano at early season roost sites in buildings, beneath bridges, or in bat boxes during spring emergence. Samples from all survey types were then assessed for presence of Pd using polymerase chain reaction testing (PCR) at either the National Wildlife Health Center or Oregon Veterinary Diagnostic Laboratory. A survey event was considered positive if at least one sample from the survey event tested positive for Pd.

Table 1 Count of NABat grid cells categorized by whether data were observed within the cell

2.3 Spatially Misaligned Regression Models

We consider three methods for estimating the association between Pd occurrence and WNS-susceptible species bat relative activity: (1) the “plug-in,” (2) the “prior,” and (3) the “joint.” To facilitate our discussion of these methods, we assume the following notation. Let \(\mathcal {A}_k\), \(k = 1, \dots , K\), denote the polygons defined for monitoring of Pd (i.e., 36-cell lattice covering MT), \(x^\mathcal {A}_k\) denote the probability of Pd occurrence in polygon \(\mathcal {A}_k\), and i[k] index NABat grid cell i within polygon \(\mathcal {A}_k\); NABat grid cells located on the boundaries of Pd polygons were attributed to a single polygon by the area of greatest overlap. Let \(y_k\) denote the number of positive Pd sampling events out of \(n_k\) events in polygon \(\mathcal {A}_k\), \(Z_{i[k]}\) denote the latent occurrence state of WNS-susceptible species in grid cell i within polygon \(\mathcal {A}_k\), and \(c_{ij[k]}\) denote the observed count of recordings classified as any WNS-susceptible species at grid cell i during visit j within polygon \(\mathcal {A}_k\). We assume a zero-inflated negative binomial sampling model for WNS-susceptible species bat activity:

$$\begin{aligned} Z_{i[k]}\sim & {} \text {Bernoulli}(\psi _{i[k]}), \nonumber \\ c_{ij[k]} | Z_{i[k]} = z_{i[k]}\sim & {} \text {Negative binomial}(z_{i[k]}\mu _{ij[k]}, \xi ) \end{aligned}$$
(1)

where \(\text {logit}(\psi _{i[k]}) = \varvec{x}_{i[k]}^{(z)}\varvec{\beta }^{(z)}\), \(\text {log}(\mu _{ij[k]}) = \varvec{x}_{ij[k]}^{(c)} \varvec{\beta }^{(c)} + \alpha x^\mathcal {A}_k\), the negative binomial distribution is parameterized by its mean and variance (Stan Development Team 2023b), and \(\varvec{x}_{i[k]}^{(z)}\) and \(\varvec{x}_{i[k]}^{(c)}\) denote vectors of covariates associated with occupancy and relative activity at a site-night, respectively. We use superscripts on the coefficients \(\big (\mathrm{e.g.,} \varvec{\beta }^{(c)}\big )\) and associated covariates \(\big (\mathrm{e.g.,} \varvec{x}_{ij[k]}^{(c)}\big )\) to distinguish the responses with which they are affiliated. The zero-inflation component accounts for potential false negatives arising from no recordings classified to WNS-susceptible species on any site-night even though at least one species truly occurred within a grid cell (MacKenzie et al. 2002).

The ecological response model described in Eq. (1) requires specification of \(x^\mathcal {A}_k\), the probability of Pd occurrence in polygon \(\mathcal {A}_k\), in order to estimate \(\alpha \), the association between the probability of Pd occurrence and the log-mean WNS-susceptible bat activity. While the “plug-in,” “prior,” and “joint” methods differ in their propagation of the uncertainty in estimation of \(x^\mathcal {A}_k\) through to the ecological response model, they all rely on the same model structure for evidence of Pd occurrence. Let \(y_k\) denote the observed count of positive Pd survey events from \(n_k\) trials in polygon \(\mathcal {A}_k\) and \(\varvec{s}_k\) denote the centroid of the \(k^\textrm{th}\) observed Pd polygon. The model for Pd occurrence is:

$$\begin{aligned} y_k\sim & {} \text {Binomial}\big (n_k, x_k^\mathcal {A}\big ), \nonumber \\ \text {logit}(x_k^\mathcal {A})= & {} \varvec{x}_k^{(y)} \varvec{\beta }^{(y)} + \eta _k, \nonumber \\ \varvec{\eta }\sim & {} \mathcal {N}\left( \varvec{0}, \varvec{\Sigma _{11}} \right) \end{aligned}$$
(2)

where the \(i^\textrm{th}\) row and \(j^\textrm{th}\) column of \(\varvec{\Sigma }_{11}\), denoted \(\varvec{\Sigma }_{11}^{ij}\), is given by \(\varvec{\Sigma }_{11}^{ij} = \sigma ^2 \exp \left\{ -\frac{1}{2\phi ^2} ||\varvec{s}_i - \varvec{s}_j||^2 \right\} \). To obtain predictions of \(x^\mathcal {A}_k\) in unsampled Pd polygons, we consider the spatial random effects from observed polygons, \(\varvec{\eta }\), and unobserved polygons, \(\varvec{\eta }^*\), as a multivariate normal process. Let \(\varvec{s}^*_{k}\) denote the centroids of the unobserved Pd polygons. Then,

$$\begin{aligned} \begin{bmatrix} \varvec{\eta } \\ \varvec{\eta }^* \end{bmatrix} \sim \mathcal {N}\left( \begin{bmatrix} \varvec{0} \\ \varvec{0} \end{bmatrix}, \begin{bmatrix} \varvec{\Sigma }_{11} &{} \varvec{\Sigma }_{12} \\ \varvec{\Sigma }_{21} &{} \varvec{\Sigma }_{22} \\ \end{bmatrix} \right) \end{aligned}$$
(3)

where \(\varvec{\Sigma }_{12}^{ij} = \sigma ^2 \exp \left\{ -\frac{1}{2\phi ^2} ||\varvec{s}_i - \varvec{s}^*_j||^2 \right\} \), \(\varvec{\Sigma }_{22}^{ij} = \sigma ^2 \exp \left\{ -\frac{1}{2\phi ^2} ||\varvec{s}^*_i - \varvec{s}^*_j||^2 \right\} \), and \(\varvec{\Sigma }_{21} = \varvec{\Sigma }_{12}^T\). Conditioning on the observed spatial random effects yields a multivariate normal distribution for unobserved random effects, with covariance matrix \(\varvec{\Sigma }_{2|1} = \varvec{\Sigma }_{22} - \varvec{\Sigma }_{21} \varvec{\Sigma }^{-1}_{11} \varvec{\Sigma }_{21}^T\) and mean vector \(\varvec{\mu }_{2|1} = \varvec{\Sigma }_{21} \varvec{\Sigma }_{11}^{-1} \varvec{\eta }\) (Eaton 1983, pp. 116–117).

In order to fully propagate the uncertainty in estimation of the disease model through to the ecological response model and leverage shared information between the two data sources, the proposed joint modeling framework estimates the ecological response model described in Eq. (1) and the spatial model for disease occurrence described in Eq. (2) simultaneously in a single hierarchical framework. Conversely, the “plug-in” method relies on a two-step process. First, the disease surveillance data are modeled according to Eq. (2) to obtain posterior summaries, such as the mean or median, of the probability of Pd occurrence within polygon \(\mathcal {A}_k\), denoted \(\bar{x}_k^\mathcal {A}\). Then, those summaries are included as a fixed covariate in the ecological response model, \(\text {log}(\mu _{ij[k]}) = \varvec{x}_{ij[k]}^{(c)} \varvec{\beta }^{(c)} + \alpha \bar{x}^\mathcal {A}_k\).

The “prior” method also relies on a two-step process, first modeling the disease occurrence according to Eq. (2) to obtain posterior distributions for \(x_k^\mathcal {A}\). However, rather than using posterior summaries of \(x_k^\mathcal {A}\) as a fixed covariate in the ecological response model, the “prior” method treats these probabilities as unknown and specifies informative prior distributions based on the posterior distributions of \(x_k^\mathcal {A}\) obtained from the model fit to the Pd surveillance data. In the case of probabilities, one may use the mean-variance parameterization of the beta distribution (Ferrari and Cribari-Neto 2004) as a prior distribution for the latent probabilities. That is, let \(\text {log}(\mu _{ij[k]}) = \varvec{x}_{ij[k]}^{(c)} \varvec{\beta }^{(c)} + \alpha x^\mathcal {A}_k\) where

$$\begin{aligned} x_k^\mathcal {A} \sim \text {beta}(a_k, b_k), \end{aligned}$$
(4)

where \(a_k = \mu _k \phi _k\), \(b_k = (1 - \mu _k)\phi _k\), \(\phi _k = \frac{\mu _k(1-\mu _k)}{\tau _k^2} - 1\), and \(\mu _k\) and \(\tau ^2_k\) are the posterior mean and variance of \(x_k^\mathcal {A}\) from the disease model, respectively (Eq. 2). Similar to the joint modeling framework, this approach allows for propagation of the uncertainty in the estimation of \(x_k^\mathcal {A}\) through to the ecological response model.

2.4 Simulation Study

We conducted a simulation study to compare the three models (“plug-in,” “prior,” and “joint”) with respect to estimation error and uncertainty in the posterior distributions for two outcomes of interest: Pd occurrence probabilities \(\big (x^\mathcal {A}_k\big )\) and the log-linear association between Pd occurrence and relative activity of WNS-susceptible bat species (\(\alpha \)). We considered three scenarios representing varying magnitudes for the negative association between the evidence of the fungus and our measure of relative bat activity: “no effect” (\(\alpha = 0\)), “moderate effect” (\(\alpha = -1\)), or “strong effect” (\(\alpha = -3\)). For each of the three scenarios, 50 data sets were generated from the joint model described by Eqs. (1), (2), and (3) assuming that nine polygons were randomly sampled for Pd and 100 grid cells were randomly selected for bat population monitoring using acoustic recording devices. Selecting the joint model as the generating mechanism allowed for investigating the impact of model choice on parameter estimates regardless of whether the Pd occurrence and count processes were associated by varying the value of \(\alpha \). The remaining assumed parameter values were based on empirical estimates obtained from fitting a simplified version of the fully specified “joint” model to the Montana data assuming only one year of sampling and excluding the nested random effect structure (see Sect. 2.5).

For each simulated data set, the “plug-in,” “prior,” and “joint” models were fit and posterior mean estimates, 95% credibility intervals, whether those credibility intervals captured the data generating values, and the squared-error (SE) were tracked. For a posterior mean probability of \(\hat{x}_k^\mathcal {A}\) and true generating probability of \(x_k^\mathcal {A}\), the SE was calculated as \(\big (x_k^\mathcal {A}-\hat{x}_k^\mathcal {A}\big )^2\). When summarizing the SE across all simulations, the square root of the mean of SEs, RMSE, was considered. For the two-step approaches, the Pd occurrence model described by Eqs. (2) and (3) was first fit to the simulated Pd data. Then, the ecological response model described in Eq. (1) was fit to the simulated acoustic data, incorporating posterior summaries of the disease occurrence probabilities in the linear predictor of the count response. The “plug-in” method used posterior mean estimates of disease occurrence probability as a covariate in the model, while the “prior” method assumed these probabilities were latent, relying on informed priors as described in Eq. (4). All models were fit using the probabilistic programming language Stan (Stan Development Team 2023a) and assessed for convergence with the Gelman–Rubin statistic (Brooks and Gelman 1998) and through visual inspection of trace plots.

2.5 Data Analysis

In order to estimate the association between Pd occurrence and bat activity between 2020 and 2022, the “plug-in,” “prior,” and “joint” modeling frameworks were fit to the data described in Sects. 2.1 and 2.2. For both of the two-step modeling approaches, the same iterative process described in Sect. 2.4 was used to incorporate estimates of Pd occurrence probabilities into the ecological response models. All three methods included the same covariates for the Pd \(\big (x_k^{(y)}\big )\) and bat activity \(\big (x_{ij[k]}^{(c)}, x_{i[k]}^{(z)}\big )\) processes, allowing for direct comparison between the methods.

The zero-inflation portion of each model included the mean elevation of the grid cell (meters), the average annual precipitation in the grid cell (millimeters), and the average annual temperature (degrees Celsius). A combination of bioclimatic and site-night specific covariates consistent with previous analyses of bat acoustic data was included in each negative binomial count model (e.g., Wright et al. 2018; Rodhouse et al. 2019; Stratton et al. 2022). Bioclimatic variables were obtained from the Parameter-elevation Regressions on Independent Slopes Model (PRISM, PRISM Climate Group 2022) at a 4 km resolution and included maximum nightly temperature (degrees Celsius). Site-night specific covariates included the log of the count of detections from the previous night, and an indicator for whether the detector was located by lentic water systems, lotic water systems, positioned in a flyway, positioned near a roosting structure, or other, and the first-order interaction between the log of the count of detections from the previous night and the site-type indicator. The linear predictor for Pd occurrence included the latitude, longitude, their interaction, and second and third degree polynomial effects. All continuous covariates were scaled to have a mean of zero and a standard deviation of one prior to estimating the models.

In order to account for potential temporal correlation in bat activity or Pd occurrence over the three surveyed years, first-order auto-regressive terms were included in both the negative binomial count and Pd occurrence portions of each model. Additionally, to account for the hierarchical nesting structure (consecutive survey nights at a detector location and multiple detector locations nested within a grid cell), nested detector-specific random intercepts were included in the activity portion of each model. Specifically, letting i index the NABat grid cell, d index the detector within grid cell i, and t index the year of survey, the random intercepts, \(\gamma _{idt}\), were drawn from the following hierarchical formulation:

$$\begin{aligned} \begin{aligned} \mu _t&\sim N(0, \sigma _{\mu }^2) \\ \theta _{it}&\sim N(\mu _t, \sigma _\theta ^2) \\ \gamma _{idt}&\sim N(\theta _{it}, \sigma _{\gamma }^2) \end{aligned} \end{aligned}$$
(5)

All models were fit using Stan (Stan Development Team 2023a), with three independent chains of 10,000 MCMC iterations each; each model was assessed for convergence visually and through the Gelman–Rubin statistic (Brooks and Gelman 1998). A complete description of the code used to fit the model is provided in online Appendix (Stratton et al. 2023).

3 Results

3.1 Simulation Study

The precision of the posterior distribution and RMSE for the probability of Pd occurrence, \(x_k^\mathcal {A}\), depended on the choice of model, the assumed strength of association between Pd and bat activity (\(\alpha \)), and whether both data sources were observed within the Pd polygon (Figs. 2 and 3). Generally, there were greater precision and lower RMSE in estimating \(x_k^\mathcal {A}\) if Pd data were observed within \(\mathcal {A}_k\) across all methods and scenarios. Additionally, the “joint” method resulted in the greatest precision and lowest RMSE when estimating \(x_k^\mathcal {A}\) on average, followed by the “prior” and “plug-in” methods, respectively. However, in cases where the disease and ecological processes were independent (\(\alpha = 0\)), all three methods yielded approximately equivalent estimates of \(x_k^\mathcal {A}\) with similar precision and RMSE (Figs. 2 and 3, right panels).

The “plug-in” method resulted in nearly equivalent precision and RMSE in \(x_k^\mathcal {A}\) across the three values of \(\alpha \), and regardless of whether count data were observed within \(\mathcal {A}_k\) (Figs. 2 and 3). Conversely, the “joint” method resulted in increasing precision, and decreasing RMSE, as \(\alpha \) increased in magnitude regardless of whether count data were observed within \(\mathcal {A}_k\), though the increase in precision and decrease in RMSE were greater in polygons where count data were observed (Figs. 2 and 3). Together, these results suggest that the “joint” method results in both greater precision and greater overall accuracy than the “plug-in” method when estimating the probability of Pd occurrence, so long as the disease and count processes are related. For example, consider the predicted probabilities in unsurveyed Pd polygons that contained count data. In the case where Pd had a strong effect on the count process, the RMSE of predicted probabilities was greater for the “plug-in” method than for the “joint” method, on average (mean RMSE for “joint” of 0.0476, mean RMSE for “plug-in” of 0.2410). In the case where Pd had a moderate effect on the count process, the RMSE of predicted probabilities was again greater for the “plug-in” method than for the “joint” method on average, though the discrepancy was lesser (mean RMSE for “joint” of 0.0996, mean RMSE for “plug-in” of 0.2310).

For the “prior” method, the impact of increasing the magnitude of \(\alpha \) on the precision and RMSE of \(x_k^\mathcal {A}\) depended on whether count data were observed within \(\mathcal {A}_k\). If count data were observed within \(\mathcal {A}_k\), the precision of \(x_k^\mathcal {A}\) increased, and RMSE decreased, as \(\alpha \) increased in magnitude, similar to the “joint” method, though the increase was lesser. If count data were not observed within \(\mathcal {A}_k\), increasing the magnitude of \(\alpha \) had minimal impact on the precision and RMSE of \(x_k^\mathcal {A}\), similar to the “plug-in” method. These results combined suggest that in terms of precision and accuracy, the “prior” method results in Pd occurrence probability estimates that are more similar to the “joint” method if count data are observed in a Pd polygon, but more similar to the “plug-in” method if count data are not observed in a Pd polygon. For example, consider the predicted probabilities in unsurveyed Pd polygons in the scenarios where there was a strong impact of Pd on the count process. The RMSE of predicted probabilities where count data were observed was greater for the “prior” method than for the “joint” method, on average (mean RMSE for “joint” of 0.0476, mean RMSE for “prior” of 0.0795). However in the case where count data were not observed, the RMSE for the “prior” method was much greater than for the “joint” method, on average (mean RMSE for “joint” of 0.0720, mean RMSE for “prior” of 0.218). See Appendix A for visualizations of the posterior mean probability of Pd occurrence for a subset of simulations and scenarios.

When estimating \(\alpha \), all three methods resulted in nearly equivalent estimates with similar precision, yielding credibility intervals that achieved nominal coverage if the ecological and disease processes were independent (Fig. 4, bottom panel). In the case where the magnitude of \(\alpha \) was moderate, the “prior” and “joint” methods resulted in similar estimates of \(\alpha \), with the “prior” method resulting in slightly more precision than the “joint” method, and both methods yielding approximately nominal coverage (Fig. 4, middle panel). Conversely, the “plug-in” method resulted in credibility intervals that did not achieve nominal coverage, an effect that was exacerbated as the magnitude of \(\alpha \) increased (Fig. 4, top panel). In the case where there was a strong effect of Pd on the ecological response, the “prior” method again resulted in narrower credibility intervals than the “joint” method, though this increased precision came at the cost of bias as the credibility intervals from the “prior” method did not provide nominal coverage and resulted in lesser coverage than the “joint” method (Fig. 4, top panel).

Fig. 2
figure 2

Box plots of 95% credibility interval widths for Pseudogymnoascus destructans (Pd) occurrence probabilities colored by whether the estimated occurrence probability was for a polygon that contained Pd data and paneled by the assumed strength of the effect of Pd on the ecological response (columns) and whether count data were observed within the polygon (rows). Each individual value within a box plot represents the mean credibility interval width for all Pd probabilities from a single simulated data set that are consistent with the faceting conditions. Generally, the “joint” method results in the greatest precision, followed by the “prior” and “plug-in” methods, respectively. The effect of the strength of Pd on the precision from the “prior” method depends on whether count data were observed within the polygon

Fig. 3
figure 3

Box plots of RMSE for Pseudogymnoascus destructans (Pd) occurrence probabilities colored by whether the estimated occurrence probability was for a polygon that contained Pd data and paneled by the assumed strength of the effect of Pd on the ecological response (columns) and whether count data were observed within the polygon (rows). Each individual value within a box plot represents the square root of the mean of the squared errors for all Pd probabilities from a single simulated data set that are consistent with the faceting conditions. Generally, the “joint” method results in the lowest RMSE, followed by the “prior” and “plug-in” methods, respectively

Fig. 4
figure 4

Line plots of 95% credibility intervals for the effect of Pseudogymnoascus destructans (Pd) presence on the ecological response, paneled by the strength of the effect of Pd on the ecological response. The larger error bars represent average credibility intervals from each method and are colored by the proportion of credibility intervals that captured the data generating values. As the strength of the effect of Pd on the ecological response increases, the “prior” and “plug-in” methods fail to achieve nominal coverage

3.2 Data Analysis

Posterior mean estimates of the probability of Pd occurrence over time were similar for all three models and strongly suggest that the fungus has spread westward since 2020. Posterior uncertainty for the estimated probability of Pd occurrence was greatest in 2020, when the fewest polygons were sampled. Additionally, uncertainty in the posterior of the estimated probability tended to be greater in the middle of the state along the forefront of Pd spread within each year for all models considered. Pd occurrence probability uncertainty tended to be less for the “joint” model relative to the “plug-in” and “prior” methods, particularly in 2020 when the fewest Pd polygons were sampled (Fig. 5), consistent with the results of the simulation study. The discrepancy in standard deviations of the Pd occurrence probabilities is lesser in 2022 for both comparisons in Fig. 5 (“joint”–“prior” and “joint”–“plug-in”). The apparent gain in precision for the “prior” and “plug-in” models is a result of the additional Pd surveillance data that are available in 2022; as a result, less spatial prediction is required, benefiting the disjoint modeling frameworks. Even still, the “joint” model results in smaller standard deviations for the probabilities of Pd occurrence in 2022, though the discrepancy is lesser than in 2020 and 2021.

Ninety-five percent credibility intervals for a subset of regression coefficients from the log-linear predictor of the negative binomial counts are provided in Fig. 6. After accounting for the other site-level and bioclimatic variables, there is evidence of a strong, positive association between log-mean bat activity and whether the site was located near lentic water systems or lotic water systems, relative to the baseline category of “other.” Furthermore, the association is stronger for lentic water systems than for lotic water systems, consistent with previous analyses of bat activity (Blakey et al. 2018). Conditional on the other predictors in the count model, there is also evidence of a strong positive association between the log-mean bat activity and the log of the number of detections from the previous night. After accounting for the site-level and bioclimatic variables, there was evidence of a negative association between the probability of Pd occurrence and bat activity across all three years, though there was a high degree of uncertainty in each of these estimates (Fig. 6).

Fig. 5
figure 5

Map of the difference in posterior standard deviations of probabilities of Pseudogymnoascus destructans Pd occurrence between the (left) “joint” modeling approach and “plug-in” modeling approach and (right) “joint” modeling approach and “prior” modeling approach over time. Red fill denotes areas where the “joint” model resulted in greater precision

Fig. 6
figure 6

Ninety-five percent credibility intervals for a subset of regression coefficients in the count model; intercept terms, temporal random effects, and detector-level random effects are omitted, and “lpc” denotes the log of the count of detections from the preceding night. There are positive point estimates associated with sites located near water, roosting structures, or within a flyway, the maximum nightly temperature, and the activity from the previous night. There are negative effects associated with occurrence of Pseudogymnoascus destructans and some interactions between site type and activity from the previous night

4 Discussion

We explored three approaches to spatially misaligned regression modeling that allow for estimating the association between a known stressor (in our application, Pd, a non-native fungus that causes WNS) and a species response. Our simulation study demonstrated the importance of propagating the uncertainty in estimating a modeled quantity that is then included as a predictor within an ecological response model. In the scenarios where there was a moderate effect of Pd on mean relative bat activity, the “plug-in” method resulted in an estimated effect of the disease (\(\alpha \)) that did not achieve nominal coverage rates. In the case where the disease had a strong impact on mean relative activity, both the “plug-in” and “prior” methods resulted in overprecise estimates of \(\alpha \), consistent with previous findings (Cameletti et al. 2019), with lower coverage rates than our “joint” modeling framework. For the Montana data sets, the posterior distributions for the estimated association between evidence of Pd and summertime acoustic activity for WNS-susceptible species were nearly equivalent among the three models. However, the estimated magnitude of \(\alpha \) was relatively small for the three models, which was closer to our “no effect” and “moderate effect” simulation scenarios, so the similarity in posterior estimates was consistent with our simulation findings.

Our simulation study revealed an interesting practical benefit of jointly modeling two biologically linked processes, as is likely in our application, in that the response data informed the modeled predictor even when the two data types were not co-located. On average, our “joint” model resulted in less posterior uncertainty when estimating Pd occurrence than did either the “plug-in” or “prior” methods. Similarly, in our empirical data analysis we found less uncertainty in the predicted Pd occurrence probabilities for the “joint” model relative to the “plug-in” and “prior” models, on average. Our simulations showed precision increased for estimated disease occurrence probabilities with increasing magnitude of the association between disease and the ecological response (\(\alpha \)). In general, the simulation study and empirical data analysis suggested that the joint modeling framework resulted in the largest reduction of uncertainty in disease occurrence probabilities when either of the two data sources was sparsely observed or the data sources were not co-located. Together, these results demonstrate the value of the joint modeling framework for spatially misaligned regression modeling when integrating costly data sources. In such cases, the joint modeling framework can be used to improve the precision of parameter estimates when collecting large volumes of data is difficult or unfeasible.

While the joint modeling framework provides more precise estimates of disease occurrence and fully propagates the uncertainty in estimation of disease occurrence through to the ecological response, some care must be taken when implementing the “joint” model. Notably, by modeling the disease occurrence and relative bat activity jointly, information about relative activity can inform estimation of disease occurrence (Warren et al. 2012). While this is a desirable property when modeling Pd occurrence and bat activity due to the transmission mechanics of WNS, this property may not be desirable for all applications; the implication of considering the disease and ecological processes jointly is a model assumption and should be carefully assessed prior to analysis. Additionally, by considering disease occurrence and relative activity jointly, misspecification of the disease occurrence model can lead to biased estimates of the impact of the disease on the ecological response (Cameletti et al. 2019). If the two data sources are spatially misaligned and spatial predictions are required, misspecification of either model could also lead to biased predictions of the probability of Pd occurrence, potentially resulting in biased estimates of the impact of Pd on the log-mean count. Consequently, it is especially important to perform rigorous model assessment when implementing the joint model. Future work could consider the severity of the impact of model misspecification on the predicted probabilities of Pd occurrence.

Additional levels of complexity could be added to both the species distribution and spatial components of the model. For example, random intercepts could be included in the linear predictors for \(\psi \) and \(\mu \) to account for spatial correlation in the occupancy and count processes, respectively. If species-specific classifications are available for the acoustic recordings, multi-species count models can be implemented to improve model fit (Wright et al. 2020). Similarly, consideration of additional complexity in the Pd occurrence model, such as allowing for imperfect detection or differing detection probabilities by survey type (e.g., environmental, guano, or tissue) could be a useful next step to improve model fit (e.g., Campbell Grant et al. 2023). Additional temporal dynamics may also be included in the Pd occurrence model to account for potential lagged impacts of Pd occurrence. Finally, considering spatially varying coefficients for \(\alpha \) could reveal more subtle relationships between WNS and summertime bat populations (e.g., Hastie and Tibshirani 1993; Fan and Zhang 1999; Finley 2011). Development of novel models for monitoring of wildlife disease remains an area of active research (Hefley et al. 2017; Hicks et al. 2020; Watsa, M. and Wildlife Disease Surveillance Focus Group 2020; Wiens and Thogmartin 2022), and as the statistical methodology evolves, our joint modeling framework can be easily updated accordingly.

For the empirical analysis outlined in this paper, point-referenced spatial information was obfuscated due to sensitive locations used for Pd surveillance. However, the joint modeling framework described in this paper is well suited to handle point-referenced disease occurrence data; alternatively, the geostatistical spatial process may be replaced by techniques for areal data, including a Gaussian Markov random field (Banerjee et al. 2015). While we did not encounter any computational challenges during our empirical analysis due to the relatively small number of Pd polygons, re-considering the surveillance data as point-referenced could increase the computational burden, particularly when expanding to larger extents. The spatial Gaussian process (GP) specified in the “joint” model is prone to scalability issues; notably, cubic complexity with the number of spatially indexed observations (Liu et al. 2018). Recently, locally approximated GPs (Gramacy and Apley 2015) and nearest neighbor GPs (Datta et al. 2016) have been proposed as solutions to the computational burden of GP modeling for large spatial data sets. Additionally, data augmentation strategies could be explored to afford computationally efficient Gibbs updates of regression coefficients for the negative binomial sampling model (Polson et al. 2013), or MCMC techniques may be abandoned for computationally efficient alternatives, such as the integrated nested Laplace approximation (Rue et al. 2009). Such improvements would allow for comparison of additional modeling techniques, including the “feed-forward” approach (Cameletti et al. 2019). Future exploration into optimization of the joint modeling framework for large spatial domains would directly benefit wildlife conservation activities coordinated across species ranges.

Future work could also focus on evaluating the impact of co-locating the Pd surveillance locations within the acoustic monitoring grid cells and consider optimal sampling designs for estimation of the “joint” model. Investigations into optimal designs for spatial modeling typically focus on guidance for selecting locations to measure one variable (e.g., relative bat activity) to minimize prediction errors or estimation errors for the spatial parameters (Zimmerman 2006; Irvine et al. 2007). However, exploring how to distribute sampling effort for both data sources within the joint modeling framework would be valuable. Optimal sampling designs for the Pd surveillance process would ideally result in reduced uncertainty in predicted Pd occurrence at unsurveyed locations, a property of maximum-entropy sampling designs (Shewry and Wynn 1987; Wang et al. 2020). However, the optimal design must also accommodate the nuances of ecological sampling. Co-location is often either impractical or impossible because of the biology of the species, as in our application, or constrained by organizational barriers. The practical challenge is that one organization may be responsible for collecting and managing the pathogen or disease data and a different organization could be responsible for coordinating the surveys for the potentially impacted wildlife species throughout its range. Our proposed joint modeling framework provides a means to integrate disparate biosurveillance surveys with expansive wildlife population monitoring efforts without requiring co-location. Importantly, the joint modeling framework is a statistical solution to integrating, related but parallel, monitoring activities to better inform species conservation across large landscapes.