Bayesian Models for Analysis of Inventory and Monitoring Data with Non-ignorable Missingness

We describe the application of Bayesian hierarchical models to the analysis of data from long-term, environmental monitoring programs. The goal of these ongoing programs is to understand status and trend in natural resources. Data are usually collected using complex sampling designs including stratification, revisit schedules, finite populations, unequal probabilities of inclusion of sample units, and censored observations. Complex designs intentionally create data that are missing from the complete data that could theoretically be obtained. This “missingness” cannot be ignored in analysis. Data collected by monitoring programs have traditionally been analyzed using the design-based Horvitz–Thompson estimator to obtain point estimates of means and variances over time. However, Horvitz–Thompson point estimates are not capable of supporting inference on temporal trend or the predictor variables that might explain trend, which instead requires model-based inference. The key to applying model-based inference to data arising from complex designs is to include information about the sampling design in the analysis. The statistical concept of ignorability provides a theoretical foundation for meeting this requirement. We show how Bayesian hierarchical models provide a general framework supporting inference on status and trend using data from the National Park Service Inventory and Monitoring Program as examples. Supplemental Materials Code and data for implementing the analyses described here can be accessed here: https://doi.org/10.36967/code-2287025.


INTRODUCTION
The central challenge of statistics is to assess the value of observations as evidence (Woodworth 2004). Inevitably, there are observations that are missing from the set of data we obtain, observations that could potentially inform the questions we seek to answer. Missing observations arise in two ways, by accident and by design. Observations are accidentally missing as a result of logistical problems, non-responses, instrument failures, or other unintended causes. This is what most practitioners think of when we speak of missing data. But data are also intentionally missing because realized experimental and sampling designs include a subset of all of the potentially informative observations. Reliable inference requires dealing with both types of missingness.
An important illustration of data that are missing by design arises from long-term monitoring programs that collect data to inform management of natural resources at local, regional, and national scales. Notable examples in the USA include the Natural Resource Conservation Service National Resources Inventory, the Forest Inventory and Analysis and Forest Health Monitoring programs of the US Forest Service, the Bureau of Land Management Assessment, Inventory, and Monitoring Strategy, and the Inventory and Monitoring programs of the National Park Service and US Fish & Wildlife Service. These programs collect large, diverse sets of data using complex sampling designs that include stratification, revisit schedules, finite populations, censored or truncated observations, and unequal probabilities of inclusion of sample units. The missingness created by these designs cannot be ignored in analysis.
Analysis of data from large monitoring programs has traditionally emphasized designbased inference implemented using variations of the Horvitz-Thomson estimator Stehman and Overton 1994;Messer et al. 1991;Courbois and Urquhart 2004;Kincaid et al. 2004). Design-based inference requires probabilities of inclusion of a sample unit in the analysis to give proper weight to samples drawn from a finite population (Irvine et al. 2018;Williams and Brown 2019). However, this approach imposes substantial limitations on inference beyond point estimates of means and variances (Gelman 2007;Williams and Brown 2019). Point estimates are not capable of supporting statistical inference on temporal trend or the predictor variables that might explain causes of trend, which instead requires model-based inference (Shanahan et al. 2021). Moreover, existing statistical packages using the Horvitz-Thompson estimator developed for analysis of data from complex designs, for example, the survey (Lumley 2015), spsurvey (Kincaid and Olsen 2019), and trendNPS (Starcevich and Mitchell 2017) packages in R (R Core Team 2020), lack the flexibility to deal with the many idiosyncrasies that inevitably arise in monitoring data.
Here, we apply Bayesian hierarchical models to the analysis of data from complex designs with non-ignorable missingness, using data from networks in the National Park Service Inventory and Monitoring Program as an example. Our goal is to illustrate a flexible approach for dealing with non-ignorable missingness that is firmly grounded in statistical theory and that expands the inferences that are feasible relative to those based on the Horvitz-Thompson estimator. Our intended audience includes statisticians interested in environmental monitoring as well as applied researchers and decision makers who use monitoring data. This paper is organized as follows. We begin by describing data sets obtained by the National Park Service Inventory and Monitoring Program (reviewed by Olsen et al. 1999;Fancy et al. 2009), highlighting the problem of missingness created by design. We then provide an overview of the concept of ignorability (sensu Gelman et al. 2013, Chapter 8) in the Bayesian framework, ideas that are fundamental to understanding the modeling approach we advocate. Next, we show a highly general model that can be modified to accommodate the different types of observations and different sources of missingness that arise in the Inventory and Monitoring data. Finally, we illustrate the use of this model with three specific applications involving different types of non-ignorable missing data.

MOTIVATION FOR MONITORING
Concern about threats to natural resources in national parks motivated the United States Congress to require the National Parks Service to implement a nationwide Inventory andMonitoring Program in 1998 (Fancy et al. 2009). The purpose of the Inventory and Monitoring Program is to provide scientifically reliable information about the status and trend of the health of park ecosystems over time, enabling park staff to make better decisions about managing natural and human resources in the face of rapid environmental change (Fancy et al. 2009). In the lexicon of the program, an "indicator" is an observable quantity describing an environmental attribute of interest (sensu Larsen et al. 2001). Examples of indicators include counts of individuals or species, visual classification of vegetative cover, and measures of soil stability and water chemistry. "Status" is an estimate of the mean or median value of the indicator within a defined temporal window. "Trend" is the longterm, directional change in an indicator over time apart from year-to-year variation that is caused by changes in weather, observers, instrumentation, disturbances, or other short-term influences on status.

SAMPLING DESIGN
There are 32 Inventory and Monitoring networks nationwide, roughly organized by ecoregion. Each network collaborated with park managers, academics, and agency scientists to identify indicators of the condition of resources in the parks, which, taken collectively, could be used to gauge the health of park ecosystems (Fancy et al. 2009). We built models to analyze data collected by six Inventory and Monitoring networks: the Southern Plains, Southern Colorado Plateau, Northern Colorado Plateau, Rocky Mountain, Sonoran Desert, and Chihuahuan Desert networks (Fig. 1). Although these networks include a variety of administrative units including national parks, national historic sites, and national monuments, we will refer to all units as "parks." Sampling was predominantly stratified random (Fig. 2a). Strata were chosen to represent major vegetation communities and soil types. Sites, typically 0.1-1 ha in size, were most often chosen within strata using a spatially balanced design (Stevens and Olsen 2004;Theobald et al. 2007). Inclusion probabilities π jkt for site j = 1, 2, ..., J kt within stratum k at time t were equal to π jkt = J kt /N k in some parks, but in others, the inclusion probability of a site was based on the cost of travel, c jk , to the site. Replicated observation units (plots and transects) were systematically located within sites.
Allocation of observations to sites over time followed a revisit design (Fig. 2b). Sites were deliberately omitted from sampling during some years, thereby allowing greater spatial coverage of the stratum over time while accommodating limited financial resources for collecting observations. Figure 2. a A schematic of a "generic" sampling design. Although each network often has a unique layout and vocabulary for sampling units, they all share the same basic components, including replicated observation units (plots, quadrats, and transects) within sites, with sites nested in strata, and strata within parks. b An example of a revisit design where a subset of sites are visited each year .

NON-IGNORABLE MISSINGNESS
The sampling design created data that were intentionally missing. There were five sources of non-ignorable missingness that must be accommodated in the analysis: 1. A revisit schedule created missing data at each site during some years. Proper inference on status across sites within a year and the trend across years must account for the missingness created by the revisit schedule, including missing response data and covariates.
2. Sites were selected using a stratified design. Inferences at park scales (across strata) must properly account for the stratification.
3. Sites sampled within a stratum were a set drawn from a finite number of potential sites.
4. Probabilities of inclusion of sites within strata in some parks were related to the cost of travel to the sites.
5. Some data were truncated and / or censored, creating observations that were partially missing.
We show how a general, Bayesian framework can properly account for all of these nonignorable features of the sampling designs. In the next section, we outline the general theoretical principles supporting inference from designs with non-ignorable missing data. We then describe our general approach to modeling before turning to specific applications.

IGNORABILITY
Financial constraints and the need to represent environmental variation over space and time motivate environmental monitoring programs to implement sampling designs that depart from simple random sampling. Reliable inference from observations collected in the complex designs typical of these programs depends on including information in the analysis about how the data were collected. The concept of ignorability is fundamental to deciding what information must be included. Here, we provide a general, theoretical underpinning needed to understand the specific examples of analysis that follow.

THE COMPLETE DATA LIKELIHOOD
All sampling designs result in a subset of observations from the population we seek to understand. The complete data include both the observed and the missing data (Gelman et al. 2013;Link and Barker 2010). Properly modeling this missingness is required for inference in all but the simplest designs. We explain how this requirement is met in the Bayesian framework following the reasoning of Gelman et al. (2013, Chapter 8).
We define the matrix Y = (y 1 , y 2 , ..., y n ) to include all of the potential observations that could be obtained from the population of interest. Each of the y i is a vector with elements y i j , j = 1, 2, ..., J representing, for example, a single point in a time series of J years at i = 1, 2, ..., n different sites. The same logic we develop here applies if y i is a scalar and y is a vector of all of the potential observations. We define the inclusion matrix Q as a matrix of the same dimension as Y indicating whether y i j is observed, q i j = 1, or missing, q i j = 0. The observed values of Y are Y obs , and the unobserved values are Y miss . We emphasize for the purposes here that missing implies observations that are omitted by design, but it could also indicate observations that are unintentionally missing, as might occur when an observation was planned but subsequently omitted due to unforeseen logistical impediments. We notate fully observed covariates as X, including predictor variables (for example, weather or geophysical variables, and indicators for observers) as well as covariates that specify elements of design, for example, membership in strata or timing of repeated measures in revisit schedules. The quantities θ are parameters of the distribution of the complete data and φ are parameters of the inclusion matrix. The complete data likelihood given parameters θ, φ, and covariates X is (1) The bracket notation [a | b, c] reads the probability or probability density of a conditional on b and c (Gelfand and Smith 1990). Model parameters θ , φ are distributed jointly conditional on the observed quantities X, Y obs , and Q We obtain marginal posterior distributions of θ by integrating out φ and the missing data in Eq. 2 which gives the posterior distribution of θ averaged over φ for non-ignorable designs. This integral can be approximated by posterior simulations of the joint vector of unobserved quantities Y miss , φ, and θ, which are then used to compute quantities of interest (means, credible intervals). For example, draws from the joint distribution of φ and θ could be used to impute the missing observations Y miss . Statistics of interest could then be computed using the observed and the imputed missing data (e.g., Gelman et al. 2013, Eq. 8.5). It is important to note that Eq. 3 is used for inference when designs are non-ignorable. Inference can be made more simply when designs are ignorable, as we describe next.

WHEN ARE DESIGNS IGNORABLE?
Sampling designs are ignorable when information about how the data were collected is properly included in the model used for analysis. Bayesian inference for ignorable designs can be made on unobserved quantities of interest θ by conditioning solely on the observed data and covariates omitting Q using In this case, [θ | X, Y obs ] (from Eq. 4) equals [θ | X, Y obs , Q] (from Eq. 3).
Monitoring designs that are ignorable without covariates are limited to simple random sampling. In this case, However, monitoring programs do not typically collect simple random samples, which means that sampling designs are ignorable only by conditioning on covariates so that The design is ignorable when Eq. 5 is satisfied (Gelman et al. 2013, p. 203) and when φ and θ are independent in the prior distribution [φ | X, θ ] = [φ | X]. It follows that the central challenge in Bayesian inference using complex designs is assuring that the likelihood [Y obs | X, θ ] contains design covariates (or equivalent subscripting of parameters) such that Eq. 5 is true. It is generally impossible to prove this equality when missing data arise unintentionally, as is the case, for example, in non-response to a survey (Gelman and Hill 2009, page 530). However, it is possible to test whether such missing data are ignorable using the approach of Irvine et al. (2018, supplement 1). We do not treat analysis of unintentionally missing data, although many of the principles we describe are germane to their analysis.

FINITE SAMPLING
Ignorability is particularly relevant to monitoring data because these data are often composed of samples from a finite population. Finite populations can be thought of heuristically as those for which the observed sample units comprise a relatively large fraction of the total number of units that could be sampled without replacement. In this case, the design is not ignorable even when the sample is completely random because information about the sample size n and the population size N must be included in the analysis. Traditionally, finite population inference was achieved by the Horvitz-Thompson estimator for predictions of means and variances using the observed data by including n and N in the inclusion probabilities. By contrast, a Bayesian approach to finite sample inference is to model the complete data, including observed and missing values (Link and Barker 2010;Gelman et al. 2013). Missing values are simulated from the uniquely Bayesian posterior predictive distribution. Inference then proceeds using the complete data likelihood (Eq. 1) as illustrated in Sect. 4.3.

NOTATION FOR IGNORABLE DESIGNS
Proper models for analysis of non-ignorable missing data can be specified equivalently by the matrix X or by appropriate subscripting of parameters and covariates corresponding to indexing of design covariates included in X. The subscripting approach is somewhat more conventional and transparent. We will use subscripts rather than a design matrix to specify the models that follow.

MODEL-BASED INFERENCE
Inference on status and trend in indicators was needed at three scales: site, stratum, and park. We first discuss models for individual sites within strata. Next, we discuss stratum-level inference before turning to inference at the park scale. The general framework described in this section deals with three sources of non-ignorable missingness (items 1, 2, and 3, Sect. 2.3). The approach we outline required the assumption that Eq. 5 is satisfied. We describe how we assured the equality of Eq. 5 in the sections that follow.

Inference on Sites
We modeled data from sites within strata using a general, hierarchical formulation for the posterior and joint distribution of unobserved quantities Bracket notation (Gelfand and Smith 1990) implies that any distribution or density appropriate for the support of the random variable y i jkt could be used. Generality in notation is achieved using the moment matching function h() that returns the parameters of a distribution given its first and second central moments (Hobbs and Hooten 2015, section 3.4.4). The subscript i = 1, 2, ..., n j indexes observations within site j; j = 1, 2, ..., J k indexes sites within stratum k, and k = 1, 2, ..., K indexes strata within parks. The sequential year of observation is given by t = 1, 2, ..., T where T is the total number of years of monitoring. The design covariate x jk specifies years during which site j in stratum k was sampled according to its revisit schedule, where one is the first year of sampling. So, for example, x jk = (1, 5, 10) if the site was sampled in years one, five, and ten. The parameter α 0 jk is the intercept, and α 1 jk is the trend slope in a generalized linear model (linear, exponential, or logit −1 ) appropriate for the data, notated by the link function g(). We note that although we used generalized linear models with a linear term for temporal trend, any function, linear or nonlinear, could be used in place of Eq. 6. Covariates, w jkt , also were chosen to explain spatial and temporal variation in the response, and β represented the fixed effects of these covariates. It should be noted that the coefficients β can be modeled as fixed at the park scale, fixed or random at the stratum scale, β k , or as random site-level effects, β jk . We typically invoked the simplest park-level fixed effect coefficients unless circumstances (e.g., an expectation of meaningfully different ecological processes operating in each stratum) called for a more complicated model. The covariates w jkt also included values needed to assure that missing data were ignorable (Eq. 5) when probabilities of inclusion were not equal within strata and years as we illustrate in Sect. 5.3. The flexibility of our approach derives from the ability to modify Eqs. 6 and 7 to accommodate idiosyncrasies in the design and the observations y i jkt . Each site within a stratum was modeled with its own intercept α 0 jk , trend slope α 1 jk , and, usually, its own variance σ 2 jk . Examples of covariates include mechanistic predictors such as annual, seasonal, or monthly data on temperature, precipitation, growing degree days, water balance, nitrogen deposition, and soil pH. Covariates also included design related measures such as cost of travel to sites. Some of these covariates were derived using remotely sensed or gridded spatial datasets; others were measured at each site. Covariates were also used to account for effects of "nuisance" variables, such as changes in observers within the sequence of years.
Intercepts and trend slopes for each site within a stratum were modeled as random variables arising from a joint, stratum-level distribution with vector of means μ α k and covariance matrix k , α jk ∼ normal(μ α k , k ), requiring the assumption that sites were exchangeable. We also treated intercepts as random and trend slopes as fixed within each stratum and used model selection  to decide whether random slopes improved the predictive ability of the model. The β coefficients were considered fixed at park scale to allow for covariates that could only be obtained for parks (not sites) and to provide for a simpler model. However, their variance-covariance with the intercept and time slope could be modeled with minor modification to Eq. (7).
Priors on all parameters were specified to be vague. Priors on model coefficients were normal centered on zero. Variance of these priors was set to assure that dispersion of the prior was much larger than the dispersion of the marginal posterior of the coefficients, except in the case of inverse logit models, where the variance was set to assure a flat distribution of the prediction of proportions . Priors on variances were broad uniform or gamma distributions. Analysis of sensitivity to priors revealed no meaningful effects of priors on marginal posterior distributions of model parameters.

DATA MISSING BY DESIGN IN TIME
Gaps in data over time created by the revisit schedule ( Fig. 2b) created missingness for each site that was non-ignorable without conditioning on covariates (item 1, Sect. 2.3). The design covariate x jk assured the design was ignorable across years within strata (Eq. 5): Inference on indicators was needed for each site for each year to assess annual status. We used the posterior predictive distribution of the meanŷ jkt of predictions of new observations y new i jkt at each site to indicate status and associated uncertainty at each site during each year: The vector θ k includes the p unobserved quantities in Eq. 7. We approximated with The superscript m indexes a sample in the converged output of a Markov chain Monte Carlo (MCMC) algorithm, and M is the total number of samples in the output. The values of t in Eqs. 9 and 10 span all of the observation periods, t = 1, 2, ..., T , some of which are missing from the design covariate x jk as a result of the revisit schedule (Fig. 2b). Consequently,ŷ jkt are out of sample predictions for the missing years.
Posterior predictions (Eq. 9) depend on fully observed covariates, which may be missing for covariates physically measured at each site (in contrast to remotely sensed ones) during years when sites were not sampled. Finite population inference also requires covariate values for sites that were never sampled. In both cases, the complete data likelihood uses the data we have about the covariate to model the "data we wish we had" (Link and Barker 2010). For example, we might model the covariate hierarchically When the covariate is observed (w jkt , t ∈ x jkt ), it is used to estimate the parameters of the covariate distribution (γ ) and its hyper-parameters (μ γ 0k , ς 2 γ 0k ). When the covariate is unobserved (w jkt , t / ∈ x jkt ), its value is approximated by draws from its distribution (Eq. 12). The example here models a temporal trend in the covariate, but other models, including a mean model, might be used. We include a specific example of modeling missing covariates in Sect. 5.2.

DATA MISSING BY DESIGN IN SPACE
Predictions of indicators at the stratum scale required finite population inference because the number of sites sampled within stratum k, J k , was drawn from a finite population with N k possible samples. Stratum-level meansŷ kt were approximated using the integrated, complete data likelihood (Eq. 3) by drawing posterior samples from the observed and missing sitelevel meansŷ jkt as follows. We made a drawj m k from discrete uniform(1, N k ) at each MCMC iteration. Dropping k and m fromj m k to reduce notational clutter, we computed The quantitiesα m 0 andα m 1 were samples from their respective distributions at iteration m. Covariate values for unsampled sitesj > J k ≤ N k were obtained from spatially gridded data corresponding to the location of sitej or were modeled using draws of missing covariate values, w˜j kt , by making a draw of γ 0 jk from the hyper-distribution of the covariate (Eqs. 11 and 12).
Equation 13 shows that the distribution ofŷ m jkt converges on the infinite population case when N k J k . Because random draws from discrete uniform(1, N k ) assure that observed sites are sampled in the MCMC in proportion to J k /N k and missing sites are sampled in proportion to (N k − J k )/N k for large M, the stratum-level mean for a finite population computed from the complete data likelihood (Gelman et al. 2013). Analysis at the park scale was not ignorable because samples were allocated following a stratified random design with sites unequally weighted for sampling among strata (item 2, Sect. 2.3). Thus, inference on the mean of indicators for each year across strata required information about how the data were collected. A weighted mean of model predictions across strata (Gelman et al. 2013, page 207) for each year was computed as (15) where N k is the number of potential sampling sites within stratum k and N is the total number of potential sites across all strata. It is not necessary that J k / K k=1 J k = N k /N because finite population Bayesian inference adjusts for over-or under-sampling of strata (Gelman et al. 2013, page 207). Inference on trend at park scales was based on the weighted mean of the trend slope The posterior distributions ofŷ t,park and μ α 1,park were approximated by computing Eqs. 15 and 16 as derived quantities  at each MCMC iteration. There were typically an insufficient number of strata (K ≤ 3) to model μ α 1,park hierarchically. It would be possible to model intercepts as parameters of the hyper-distribution μ α k ∼ normal(α park , park ) if there were a sufficient number of strata and we could assume that the μ α k are exchangeable among strata. In this case, park-level means would be approximated by first drawing a stratum indexk m with probability N K /N , computing the mean for a site within stratumk using Eq. 13, and computingŷ t,park = 1/M M m=1ŷ m jkt . The weighted approach in Eq. 16 and the hierarchical approach have been shown to give similar results (Gelman et al. 2013, Figure 8.1).

UNEQUAL PROBABILITIES OF INCLUSION
The monitoring data we analyzed usually had equal probabilities of inclusion within strata, π jkt = J kt /N k , which meant that missingness created by design within strata and within years was ignorable (Eq. 5). However, unequal probability of sampling is not unusual in monitoring data sets (item 4, Sect. 2.3). In this case, the covariates describing the model of probability of inclusion in the design must be properly included in the analysis, even if we are not interested in the explanatory value of those covariates (Gelman et al. 2013, page 200 ). We illustrate such a model in one of the examples that follow (Sect. 5.3).

APPLICATIONS
Here, we offer three illustrations of the flexibility of Bayesian hierarchical models for dealing with non-ignorable missing data. Predictions and computations of estimands of interest were based on equations outlined in Sects. 4.2 and 4.3 and are not discussed further here. Instead, the examples here add additional non-ignorable challenges frequently encountered in long-term monitoring designs. In the first examples, we treat the problem of covariates needed for prediction over time that are missing as a result of revisit designs as well as the problem of episodic changes in observers or instrumentation. Next, we deal with unequal probabilities of inclusion of sample sites within strata. Finally, we model data from designs that include censoring and truncation. In all of these cases, proper inference was obtained with relatively minor modifications of Eq. 7.

MODEL FITTING
Predictor variables were standardized. Coefficients are reported on the standardized scale unless transformations to enhance their interpretation are specifically noted. Samples from the marginal posterior distributions of unobserved quantities were obtained using an MCMC algorithm implemented using JAGS v4.3.0 (Plummer 2003) and the rjags package (Plummer 2019) in R (R Core Team 2020). Three chains were accumulated for each model. Convergence was assured by inspection of trace plots and by the diagnostics of Brooks and Gelman (1997). We used posterior predictive checks (Gelman et al. 2013;Conn et al. 2018) to assess lack of fit. Models with Bayesian p-values ≤ 0.3 or ≥ 0.7 were modified to obtain adequate fit. We examined spatial dependence in model residuals using variograms (Cressie 1993), which did not reveal spatial structure for any of the data sets we analyzed. We selected between models with group effects for the intercept and group effects for intercept and slope using posterior predictive loss and the deviance information criterion . Coefficients and other quantities of interest were summarized using the median and 95% highest posterior density interval.

CHANGES IN OBSERVERS AND MISSING COVARIATES
Species richness is a measure of the number of species found in a plant community. We developed a model of native species richness at Little Bighorn Battlefield National Monument, Montana (Fig. 1). Data were yearly counts of native species in 10, 1-m 2 plots at each site. The design created three sources of non-ignorable missingness: revisit designs, stratification, and a finite population, Sect. 2.3.
All information needed to approximate posterior distributions for intercepts and trend slopes (α jk , Eq. 6) was known because the design was stratified random with a specified revisit schedule. However, we add a nuance in this example, the inclusion of explanatory covariates measured at the site, thus creating non-ignorable missingness in the response and covariate data. The covariates for the species richness model included site-level data on the cover of non-native species and a 0-1 indicator for botanist.
The observer covariate, w 1 jkt , represents a problem frequently encountered in monitoring programs-changes in personnel collecting data. Such changes are particularly problematic when observers have different levels of experience or training. These shifts in observers can produce spurious changes over time in the response if not properly modeled. In the example here, a botanist less familiar with the local plants in the first two years of observations (w 1 jkt = 1) was replaced by one more familiar in subsequent years (w 1 jkt = 0). We also included line-point intercept data on the cover of non-native species at each site, w 2 jkt , to understand how competition with a non-native species may influence the number of native species in the community.
Inference on trend at the mean of the standardized covariates did not require values for covariates during all years, because all are zero at their (standardized) means. However, posterior prediction of means of sites (Eq. 10), including the effects of covariates, required values for all years, including years when sites were not sampled. Finite population inference also required covariate values for sites that were never sampled. These requirements are easily met for any gridded covariates appearing in the model, and for nuisance variables such as observer, which can sensibly be set to the reference level (w 1 jkt = 0). However, data on non-native species cover were collected according to the same revisit schedule as native species richness. As a result, these covariate values were missing during some years at sampled sites and were missing entirely for all years at sites that were never sampled.
We modeled this missingness using w 2i jkt ∼ binomial(r, p i jkt ), where p i jkt = logit −1 (γ 0 jk + γ 1k t + v jkt δ + i jkt ); r = 213 is the total number of point intercepts at each site at which non-native cover was evaluated, and v jkt included the same 0-1 indicator for botanist as well as a measure of total antecedent rainfall derived using a gridded data product. We used i jkt ∼ normal(0, σ 2 w 2 jk ) to account for overdispersion in non-native cover data. We expanded the likelihood in Eq. 7 to include the covariate distribution Including the distribution of the non-native cover covariate in Eq. 17 allowed posterior prediction for all sites in all years (Eqs. 9 and 10). The observed value was used during years when sites were visited and draws from the distribution of values (Eq. 18) informed years when sites were not visited. Comparison of model predictions and Horvitz-Thompson estimates revealed important differences between the two approaches. Horvitz-Thompson means and confidence intervals were computed using the spsurvey package in R (Kincaid and Olsen 2019) on the means of observations at each site because spsurvey is not able to model observations within sites, and because it requires site means to justify using a normal likelihood to model count data. Using the site means as perfectly observed data ignores the hierarchical structure in the observations (Fig. 2a). However, the number of observations within sites is not ignorable and must be included in the analysis to represent sampling uncertainty with sites. Failing to model this sampling error produces confidence intervals that are misleadingly narrow. In contrast, the broader Bayesian credible intervals appropriately reflect sampling uncertainty at the site level (Fig. 3).
Although Horvitz-Thompson estimators provide an indication of status, they do not estimate trend (Fig. 3) and cannot account for variables that may explain temporal variation in responses. The Bayesian model suggests that the actual value of native species richness during the first year of sampling was likely much higher than the Horvitz-Thompson estimate, largely as a result of a less experienced botanist identifying 19% fewer species, on average, than the more experienced successor in later years: exp(β 1 ) = 0.81 (0.74, 0.88). The Bayesian model separates changes induced by the observing system from "true" changes in the status of ecological indicators.
Park managers often want to know how alternative actions might affect the trajectory of an indicator of park health. Informing their actions requires covariates that can explain trend. We computed the difference between the posterior distribution of mean native species richness and its expected value under different management scenarios to convey the overall effect of the increasing trend in non-native species cover on native species richness. Specifically, we compared richness estimates conditioned on the actual, time-varying, and increasing cover of non-native vegetation, which increased from 0.15 (0.01, 0.47) to 0.34 (0.08, 0.69) between the first and final year of monitoring, to scenarios in which non-native species cover was held at an alternative value. Figure 4 shows the difference between mean native species richness based on observed non-native cover and a hypothetical scenario in which non-native cover was managed to remain at 5% cover over the period of observation. The results suggest the gains in native species diversity that might be expected with management intervention and, conversely, how much might be lost if no action is taken.

UNEQUAL PROBABILITIES OF INCLUSION
Probabilities of inclusion of sample units in designs for monitoring programs are sometimes chosen based on a fully observed covariate. For example, the probability of choosing a site for sampling may be inversely related to the travel cost required to obtain a sample from the site. In this case, the treatment assignment mechanism can be ignored if the model used for inference includes the travel cost covariate (Sect. 3.2).
We modeled soil stability at Organ Pipe Cactus National Monument, Arizona (Fig. 1). There were four sources of non-ignorable missingness in this application (revisit designs, stratification, a finite population, and unequal probabilities of inclusion, Sect. 2.3). The probability of inclusion for a site within a stratum was inversely related to the estimated hiking time to each site. Hiking times were binned into five categories, and probabilities of Figure 4. a The Bayesian model of native species richness allowed mean richness to co-vary with a suspected driver, the cover of non-native species. Data on non-native species were collected only during site visits and, thus, had to be modeled. The complete model for richness, which included a modeled covariate, was used to evaluate the likely outcomes of various management scenarios. For instance, in b we see the difference between native species richness predicted from observations and a prediction in which non-native species cover was hypothetically managed to remain at relatively low (i.e., 5%) cover. Negative values indicate that species richness is, in reality, lower than it could have been had non-native cover been suppressed. All quantities were estimated using methods for small, finite populations (Eq. 13) .
inclusion were assigned to each category. Thus, the parameters in the inclusion model φ were known cut-points at the bounds of each category. Making the design ignorable required including hiking time as a fixed effect in the model (Eq. 5).
Observations of soil stability were collected in six ordered classes from low to high using a modification of Herrick et al. (2005). Stability class data were used to infer the properties of the latent, continuously distributed random variable soil stability from which the categories arise (Agresti 2012). Thus, the latent environmental process and its variance were modeled separately from the observation process and its variance (Fig. 5a). Specifically, observations were assumed to arise as a realization of y i jkt = categorical(p i jkt ) where p is a D length vector specifying the probability an observation falls in category d, p d = Pr(y i jkt = d).
We specified the likelihood in Eq. 7 as We assume that the true, unobserved, continuous value of soil stability z i jkt arises from a normal distribution with parameters μ jkt (Eq. 6) and σ 2 jk . The z i jkt , in turn, give rise to the observations with the known bounds χ 1 , χ 2 , ..., χ D of the soil stability classes 1, 2, ..., D. The covariate vector w jk (Eq. 6) included gridded estimates of hiking time used to allocate sites to strata and total precipitation in the previous water year.
We compared the Bayesian inference described above with categorical Horvitz-Thompson estimates developed using spsurvey. Inputs to the Horvitz-Thompson estimator were annual summaries of site-level median soil stability class. Small sample sizes lim-ited the range of classes represented in the data in any given year, the effects of which are evident in park-wide Horvitz-Thompson estimates of the proportion of the landscape in each class (Fig. 5b). For instance, site-level medians in the first year of sampling spanned only two of the six stability classes. In contrast, the Bayesian model used all of the data, including individual stability class observations within sites, which produced estimates that better reflected the data by representing the full range of sampling variability (Fig. 5b). We also compared predictions from a model that inappropriately ignores cost of travel to sites to a model that includes hiking time as a covariate. Both of these models (both Bayesian) included previous water year precipitation as an additional covariate to investigate meteorological effects on soil stability mediated by primary productivity and soil organic matter. The results illustrate how ignoring unequal probabilities of inclusion of samples can bias inference (Fig. 5c).

CENSORED AND TRUNCATED DATA
Environmental data are sometimes censored. Censored observations specify that the true, unobserved value of a response of interest is above, below, or between known cut-points. In contrast to censoring, truncation involves deliberate decisions not to collect certain types of observations. For instance, trees below a certain size class might not be measured. As a result, truncation leads to missing information on ecological quantities within some range. Censoring and truncation create designs that are non-ignorable and known (Gelman et al. 2013, page 204). Existing approaches to designed-based inference do not support inference from these types of data (Fox et al. 2015).
We modeled simultaneously censored and truncated data on canopy gaps from grasslands at Capitol Reef National Park, Utah (Fig. 1). Non-ignorable aspects of the design included revisit sampling, stratification, and partially missing data, Sect. 2.3. Canopy gaps are used as an indicator of wind erosion potential (Webb et al. 2020) in long-term monitoring programs throughout arid and semiarid regions of the western USA (Herrick et al. 2005). Canopy gaps were defined as the length (cm) between intersections with the outermost edge of perennial plant canopies along 50 meter transects. Observations were censored whenever a canopy gap spanned either end of the transect. In these cases, we know that the gap was at least as large as the distance between the transect end and the nearest canopy edge, but we do not know the true gap size. Data were truncated as well as censored because canopy gaps smaller than 20 cm were not recorded.
We modified the modeling framework (Eq. 7) as follows. Each site had a vector of data for each visit u jkt with value one if an observation spanned either end of a transect and hence was censored, and zero when the observation occurred wholly within the transect and was not censored. The data vector y jkt contained the measurements of uncensored canopy gaps and a code indicating "missing" for censored observations. A third data vector c jkt contained the cut-point for censoring, that is, the distance between a canopy edge and an end of the transect and a code for missing for the data that were uncensored. We specified the posterior and joint distribution as we can reasonably assume that the value of u i jkt is conditionally independent of θ k when the values of y i jkt and c i jkt are known. Figure 5. a Soil stability is a continuous latent variable, the distribution of which is inferred from categorical observations. Histograms of the data (soil stability observations assigned to a stability class, 1-6) are shown against draws from the posterior distribution of mean soil stability over the observation period (semitransparent lines). No sites were sampled in 2014. b A comparison of Bayesian and categorical Horvitz-Thompson estimates of the probability of soil stability falling in a given class. c The 80% highest posterior density interval for the Bayesian finite population estimates of mean soil stability at park and stratum scales (for two strata, V101 and V103) under two modeling scenarios: an analysis in which the determinant of the inclusion probability of sites, hiking time, was modeled (white interval), and the other in which it was ignored (black interval). The gray areas correspond to values at which the credible intervals for the two models overlap. Precipitation was held at its mean in a whereas predictions in b and c were conditioned on observed values .
We specified [u i jkt | y i jkt , c i jkt ] by first defining the indicator function We used this function to formulate f (u i jkt | y i jkt , c i jkt ) = 1 i fu i jkt = 0 I c i jkt (y i jkt ) if u i jkt = 1, which we used to form a joint likelihood in Eq. 7 where T(20, ∞) indicates truncation of the left tail of the gamma distribution at 20 cm.
Equation 19 can be understood this way. The observation of gap size y i jkt is missing when the data are censored (u i jkt = 1). In this case, the likelihood [u i jkt | y i jkt , c i jkt ] causes the observation to be imputed from the distribution of y i jkt to the right of c i jkt . Imputation is not necessary when the data are not censored (u i jkt = 0); the likelihood [u i jkt | y i jkt , c i jkt ] evaluates to one, and the joint likelihood simplifies to a single component appropriate for the uncensored data.
We compared analysis from the model including censoring and truncation (Eq. 19) with analysis from two models ignoring them. Omitting censoring and truncation produced faulty inference (Fig. 6). The Horvitz-Thompson estimates, which ignored incomplete observations, once again revealed excessively narrow confidence intervals (Fig. 6a) relative to the Bayesian credible intervals. We also compared two Bayesian models, the first of which accounted for censored and truncated observations, the second of which ignored them. Mean canopy gap size was similar for the two approaches: 154 (41, 346) cm vs. 152 (57, 294) cm (Fig. 6b). However, the standard deviation of canopy gap size was substantially higher when censored and truncated observations were modeled appropriately: 240 (34, 532) cm versus 172 (37, 375) cm when ignored (Fig. 6b). This is likely because modeling censoring and truncation acknowledges that censored gap size observations represent their known lower limits, not their "true" sizes, and that gaps < 20 cm were present, but unmeasured.
When censor limits are included in the model, the lengths of censored gaps are imputed on the basis of the estimate of all model parameters at each iteration of the MCMC. Inputted observations are at least as large as the censor limits. Thus, the largest observations when censoring is properly modeled exceed those obtained when censoring is ignored: 9090 (4144, 20,595) cm versus 4979 (2599, 9964) cm (Fig. 6b). It can also be shown that it is precisely the largest gaps that tend to be among the censored observations. This result has clear significance for management where erosion is a concern. A model that ignores censoring and truncation will underestimate the proportion of large gaps, which are especially prone to erosion. A manager misled by an incomplete analysis might believe that erosive potential of the landscapes is lower than it actually is. Figure 6. a A comparison of Bayesian (gray error bars, dashed gray line) and Horvitz-Thompson (black) estimates of mean canopy gap size over a 10-year period in an analysis of data from a single stratum ("Hartnet-deep grassland") at Capitol Reef National Park. Error bars show the 95% confidence intervals for the Horvitz-Thompson means and 95% highest posterior density intervals for the Bayesian predictions. Model estimates are seen against the uncensored gap size measurements in each year. Observations, all of which appear above the lower truncation limit at 20 cm, are shown on a logarithmic scale and were jittered to avoid obscuring individual values. b Differences between summary measures of canopy gap size from two Bayesian models, one in which censoring and truncation in the data are modeled (gray probability density estimates) versus another in which they are ignored (white). Summaries include the posterior distributions of the mean (left), standard deviation (center), and maximum predicted canopy gaps size (right) .

DISCUSSION
Bayesian hierarchical models offer a coherent and flexible approach to modeling observations that are "missing by design." We illustrated the connection between sampling design and analysis in five ways (Sect. 2.3) using data from the National Park Service Inventory and Monitoring program. The primary advantages of the Bayesian hierarchical approach for analyzing monitoring data are (1) the use of posterior predictions to fill in gaps in data missing by design, (2) formal inference on temporal trend, (3) the ability to model explanatory covariates, (4) a rigorous approach for inference on finite populations based on the complete data likelihood, and (5) the ability to model the observation process separately from the environmental process. Several of these advantages were also identified by Shanahan et al. (2021). There are disadvantages, however. Model building, coding, and checking require greater effort than simple computation of Horvitz-Thompson means and variances.
Relatively large data sets are required because many more parameters are modeled relative to the Horvitz-Thompson approach.
The Bayesian approach provides a particularly useful way to model temporal trend from sites with unequal frequency of sampling. The hierarchical model in Eq. 7 and its implementation in posterior prediction (Eqs. 9 and 10) support "borrowing strength"  for inference on the intercept and trend slope for each site. Sites with low uncertainty in estimates of α jk have a greater influence on μ α k than sites with high uncertainty. Borrowing strength provides a natural weighting of influence of sites sampled in revisit designs.
Environmental managers often need inference that extends beyond simple means and variances describing the status of resources at different points in time. This need can only be met by model-based inference. Inferring trend from observing a series of means and confidence intervals over time in the absence of a properly fit model can produce spurious inferences, as we illustrated in Sect. 5.2. A related issue is that merely knowing that change is occurring, as might be inferred from a time-slope term in a model, is rarely sufficient for informed decision making. An ability to infer the drivers of change enabled by modeling coefficients of time-varying covariates is also needed (Shanahan et al. 2021) to inform managers of potential actions that could be taken to modify the trajectory of indicators of park health.
Design-based analysis using Horvitz-Thompson means and variances was originally motivated by the need for inference on finite populations Thompson 1951, 1952), a challenge frequently confronted by monitoring programs. Generalized linear models supporting inference on trend can be fit to data from complex sampling designs in the maximum likelihood framework using sample weights to provide information about the design that must be included in the analysis (Lumley 2004(Lumley , 2015Lumley and Scott 2017). Although these approaches have been used primarily for data in the social sciences, there is no reason they could not be applied to complex designs used for monitoring natural resources (e.g., Irvine et al. 2018). However, inferences from generalized linear models using sampling weights must be marginalized across levels in multi-level, hierarchical models (Hester 1991;Lumley and Scott 2017). For example, it would not be possible to estimate separately trend for site, strata, and park levels using sampling weights. The need to marginalize across levels strongly limits inference from this type of analysis. The Bayesian approach we describe does not suffer from this limitation.
Modern statistical analysis requires software (Ripley 2002). This ubiquitous need can tempt analysts of monitoring data to use existing statistical packages in ways that fail to match the data and the way they were collected to the algorithms implemented in the software. Such failures might include applying analyses assuming normal distributions to observations with strictly nonnegative or discrete support, ignoring the need for mixture models as would occur with zero inflation, treating categorical data as metric, discarding censored data, and failing to respect the finite nature of a population, among others. Overlooking these unique features of data can be recommended only on the basis of its convenience. We have properly modeled all of these idiosyncrasies using relatively minor modifications of Eq. 7.
The analysis of environmental monitoring data has historically used design-based inference Thompson 1951, 1952) that lacks the flexibility to deal with the diverse challenges that inevitably arise in analysis of large monitoring data sets. The concept of ignorability provides a general theoretical foundation for constructing model-based inference that appropriately includes information about the way data were collected in the analysis. The heart of this challenge is to model data that are "missing by design." The Bayesian, hierarchical approach can rise to this challenge, thereby offering a broadly useful framework for analyzing data collected by environmental monitoring programs.