Background

Domestic livestock is recognized ecosystem engineers in semi-arid rangelands, where they directly and indirectly alter the availability of resources to a wide range of grassland-associated organisms (Derner et al. 2009). Several studies cite the influence of vegetative changes due to livestock grazing on breeding grassland birds since this species group is heavily influenced by vegetative structure (Askins et al. 2007; Brennan and Kuvlesky Jr 2005; Fuhlendorf et al. 2006; Jansen et al. 1999). Grazing pressure has historically been difficult to quantify due to variable plant responses to grazing and movements of livestock within pastures (Landsberg and Crowley 2004). However, a review of the effects of water-place distribution on rangelands suggested that distances from water sources (e.g., livestock tanks and troughs) can provide valuable context for interpreting changes in grazed landscapes particularly in areas remote from water sources (James et al. 1999; Landsberg and Crowley 2004; Ludwig et al. 2000). In South Texas, water sources are scarce. Specifically, the Coastal Sand Plain region of Texas has no natural permanent bodies of freshwater making livestock wells and holding tanks supplied by active ranching operations the only water source for domestic livestock and, thus, it is reasonable to expect water points to have a substantial impact on the distribution and intensity of localized grazing (Fulbright et al. 1990; Snelgrove et al. 2013).

Grassland bird populations have experienced precipitous declines on a continental scale over the last few decades (Brennan and Kuvlesky Jr 2005; Nocera and Koslowsky 2011). Although agricultural and livestock operations dominate the South Texas landscape, wildlife-related recreation has become increasingly important to landowners because of the associated economic value (Dodd 2009; TPWD 2016). For example, landowners can anticipate an average gross profit of $4.69 per hectare for a deer or exotic ungulate hunting lease and can expect an average gross profit of $20.99 per hectare for a quail (e.g., Northern Bobwhite Colinus virginianus and Scaled Quail Callipepla squamata) hunting lease (TPWD 2017). This area also provides crucial resources for other migratory and resident grassland birds (e.g., Cassin’s Sparrow Aimophila cassinii, Grasshopper Sparrow Ammodramus savannarum, and Dicksissel Spiza americana) that have declined throughout their ranges due to land use and climate change since 1966 (Brennan and Kuvlesky Jr 2005; Knopf 1994). It is essential we advance our understanding of how grassland birds are affected by their environment, inclusive of both their requirements to persist (i.e., resources), and how they interact with environmental features or biotic influences.

Traditionally, species distribution models (SDMs), which statistically associate a species’ occurrence with a suite of geospatial predictors, use direct variables, resources that the animal consumes or requires to persist in an area (e.g., shrub density and water availability) to define and project a species’ niche and distribution (Austin and Van Niel 2011; Elith and Leathwick 2009). Indirect variables, which are features the animal does not consume or require for persistence but with which it may still interact (e.g., competition and commensalism), are often excluded in SDMs due to the difficulty in identifying the variable ecologically, quantifying the relationship, or in managing collinearity issues adding an interaction factor to a traditional SDM framework may contribute (Atauchi et al. 2018; Leach et al. 2016; Austin and Van Niel 2011). However, recent advances in machine learning algorithms (e.g., random forest) have enabled us to include indirect variables, such as grazing pressure, in SDMs that may have more complicated relationships with the distribution of the target species than traditional resource variables (Miller 2010).

Our objective was to (1) improve traditional SDMs projecting the distribution of three summer resident South Texas grassland birds (Northern Bobwhite Colinus virginianus, Eastern Meadowlark Sturnella magna, and Cassin’s Sparrow Peucaea cassinii) by incorporating livestock grazing pressure, an indirect variable, and (2) interpret the possible effect of grazing pressure on bird distribution per each SDM approach. We used a novel approach to spatially quantify localized grazing pressure to include this variable using five SDM algorithms: BioClim, generalized linear model, MaxEnt, boosted regression tree, and random forest. Our approach serves as a valuable tool for rangeland managers when the management goal is to promote sustainable livestock grazing and recreational wildlife harvest, while maintaining viable nongame species.

Methods

Study site

The Coloraditas Grazing Research and Demonstration Area (CGRDA) is a 7684-ha area located on the 60,000-ha San Antonio Viejo Ranch (SAV) approximately 25 km south of Hebbronville, Texas, in Jim Hogg and Starr counties (Fig. 1). SAV is located within the South Texas Plains ecoregion and is managed predominantly as a cow-calf operation. Mean annual temperature within the study site is 22.6 °C, and mean annual precipitation is 502.5 mm (PRISM Climate Group 2018). SAV is one of six properties of the East Foundation that are managed as a living laboratory to support wildlife conservation and other public benefits of ranching and private land stewardship. The CGRDA is the representative of South Texas rangeland ecosystems and encompasses the Coastal Sand Plain and Texas-Tamulipan Thronscrub ecoregions. Low-growing woody plants, dense shrubs (Prosopis glandulosa, Acacia greggii, Celtis ehrenbergiana, Colubrina texensis, Aloysia gratissima, Lantana urticoides), and cacti (Opuntia engelmannii var. lindheimeri, Opuntia leptocaulis) dominate the vegetation in this area. The CGRDA is comprised of 10 pastures each assigned to 1 of 4 grazing systems (Fig. 1). Four pastures were assigned to a continuous grazing system with 2 pastures (Rodeo and Tia Nena) maintained under a high stocking rate (1 Animal Unit [AU] /14 ha) and 2 pastures (San Juan and Calichera) under a moderate stocking rate (1 AU/20 ha). Six pastures were assigned to a rotational system with 3 pastures, 1 herd maintained under the high stocking rate (Coloraditas, Desiderio, and Guadalupe units) and 3 pastures, and 1 herd maintained under the moderate stocking rate (San Rafael, Loma, and Tequileras units). Grazing was deferred on all pastures for 2 years prior to the onset of livestock grazing in December 2015.

Fig. 1
figure 1

Locality and pasture composition of East Foundation’s Coloraditas Grazing Research and Demonstration Area (CGRDA). Four pastures were assigned to a continuous grazing system with 2 pastures (Rodeo and Tia Nena) maintained under a high stocking rate (1 Animal Unit [AU]/14 ha) and 2 pastures (San Juan and Calichera) under a moderate stocking rate (1 AU/20 ha). Six pastures were assigned to a rotational system with 3 pastures, 1 herd maintained under the high stocking rate (Coloraditas, Desiderio, and Guadalupe units) and 3 pastures, and 1 herd maintained under the moderate stocking rate (San Rafael, Loma, and Tequileras units)

Environmental predictors

We used canopy height, shrub density, grass spp. coverage, cacti spp. coverage, and bare ground coverage recorded from ground surveys in 2016 as environmental predictors in SDMs. We collected vegetation composition and structure data from 141 permanent 20-m transects in October 2016. We allocated transects proportional to the area of ecological sites that occur in each pasture using stratified sampling resulting in 12–16 transects per pasture (Bonham 2013). We marked each transect start and collected data in a random, predetermined direction (N, S, E, W). On each transect, we sampled 5, 20 × 50 cm quadrats (5 m spacing) randomly placed at either 0, 0.5, 1, 1.5, 2, or 2.5 m from the left side of the tape and facing away from the transect start, visually recording percent cover of woody, herbaceous (later classified by grass spp.), and bare ground in each quadrat.

We also documented woody canopy cover along each of the 20-m transects by visually recording the amount of the ground (in centimeters) covered by woody plant materials (leaves and branches) and succulent (cacti) that intercepted the line transect by species (Canfield 1941; Higgins et al. 1996). If a gap in the canopy exceeded 0.5 m for an individual, we recorded separate cover measurements. We calculated percent canopy cover by summing the intercept measurements for an individual species, dividing by total line length and converting to a cover percentage. We calculated total percent cover by adding cover percentages for all species, which sometimes exceeded 100% when overlapping canopies by different species were recorded (Coulloudon et al. 1999).

Additionally, we used topographic relief (30-m2 resolution) and Optimized Soil Adjusted Vegetation Index (OSAVI, a measure of LAI) produced from remotely sensed imagery collected during the same growing season as the ground surveys. We acquired one Landsat 8-OLI tile (< 6% cloud cover) that encompassed the study area (courtesy of U.S. Geological Survey) and processed this in ENVI 5.1 (NASA Landsat Program 2016). We corrected for atmospheric conditions and converted the original image format of Digital Numbers (DN) to radiance and then surface reflectance. We first resized the image to the rectangular extent of the CGRDA pasture complex and then extracted by the study area mask in ESRI ArcGIS ArcMap 10.5. We then spatially subset the extracted image by bands 2–5 corresponding to Landsat 8-OLI band designations: blue, green, red, and NIR. Bands were stacked, and the OSAVI was calculated using the band math tool in ENVI 5.1. This index for LAI follows the standard formula [(NIR-Red)/(NIR + Red+0.16)] and uses a reflectance constant of 0.16 to adjust for high background reflectance (e.g., areas with sparse vegetation and high soil reflectance) (Rondeaux et al. 1996). In South Texas, this vegetation index outperformed other, more common vegetation indices (e.g., Normalized Difference Vegetation Index [NDVI]) in overall image classification accuracy and herbaceous coverage estimations (Fern et al. 2018).

Locations of water sources (e.g., livestock wells) within the study site and cattle stocking rates were provided by the East Foundation. To calculate water proximity, we gridded the spatial extent of the CGRDA into a fishnet (30-m2 resolution). We performed a proximity analysis on each pixel centroid using the Near tool in ArcMap 10.5 to determine distance of each centroid to location of nearest water source, usually a livestock well and holding tank as no natural surface water exists within the study site, and very little exists on the Coastal Sand Plain region of Texas as a whole (Snelgrove et al. 2013). We made considerations for seasonality as not all groundwater pumps are operational year-round on large South Texas cattle ranches and ensured only those wells known to be active during the summer of 2016 (a total of 399 wells) were used in the analysis.

Quantifying grazing pressure

Several studies have cited the strong, predictable relationship between localized grazing pressure and proximity to water sources, especially in semi-arid rangelands (James et al. 1999; Landsberg and Crowley 2004; Locatelli et al. 2016; Ludwig et al. 2000). This spatially un-even use of the pasture by the livestock is even visible in satellite imagery as one study termed the zone of high livestock impact attenuating away from each water point (typically a livestock tank fed by a well) a “piosphere” (Andrew 1988). Piospheres are areas of high “hoof-action” and generally have higher accumulation of livestock feces, soil compaction, and defoliation (Andrew and Lange 1986; Graetz and Ludwig 1976). Due to the absence of natural water sources on the CGRDA, the known stocking rates of each pasture, and the well-documented relationship between localized grazing pressure and water sources (livestock tanks and wells) in semi-arid rangelands, we used water proximity to create a surrogate index for localized grazing pressure.

We used the distance to the nearest water source (livestock tank) previously calculated by the proximity analysis and 30-m2 fishnet grid across the CGRDA. This ensured that resulting surface value estimates were the same spatial resolution as the other environmental rasters. We multiplied the distance value (m) of each fishnet pixel centroid by the density of grazing livestock (i.e., stocking rate) in each pasture using the raster math tool in ArcMap. A summary of all predictive layers included in the SDMs is presented in Table 1.

Table 1 Summary of predictor layers included in species distribution models (SDMs)

Bird occurrence data

Avian point counts consisted of 10 12-point transects (centrally located per pasture within the CGRDA). We used point count data collected on the CGRDA from April to June 2016 to build baseline SDMs. Each point was located 400 m apart, and 2 observers recorded visual and auditory occurrences of birds within 200 m of each point simultaneously yet independently. We used occurrence records rather than abundance or density since the distributional modeling algorithms required presence/absence or presence only data. We used a traditional framework in which each occurrence was counted as a “presence” record at each point, omitting the duplicate records from the double observer design, and disregarding the transect construct by subsampling the data by a 400-m cell size. This granted us a finer spatial resolution of the data set to thoroughly investigate the impacts of grazing pressure on grassland bird presence. We used only grassland-obligate species with an adequate number presence records within the CGRDA during the study period for distribution models: Northern Bobwhite, Eastern Meadowlark, and Cassin’s Sparrow.

Data processing and analyses

We imported values for each predictor (canopy height, shrub density, bare ground coverage, grass spp. coverage, cacti spp. coverage, water proximity, and grazing pressure) into ArcMap 10.5 and used Kriging interpolation to minimize spatial sampling bias and create continuous surface layers of environmental predictor values. Kriging, or Gaussian process regression, is a geostatistical method through which interpolated values are modeled by a Gaussian process governed by covariances. This method of spatial interpolation estimates a continuous surface of values directly based on values at surrounding points weighted according to spatial covariance (Van Beers and Kleijnen 2004). The Kriging interpolation algorithm is optimal for most eco-spatial modeling because it produces an unbiased prediction and calculates the spatial distribution of uncertainty allowing for an accurate estimate of error at any particular point (Mahmoudabadi and Briggs 2016). We exported the resulting GeoTIFFs and read these into the R statistical language as raster layers (Core Team 2013). We also read the GeoTIFFs representing the spatial values of elevation and topographic relief into R, and all layers were stacked to create the occurrence predictor rasters for the baseline SDMs.

We imported occurrence data for Northern Bobwhite, Eastern meadlowlark, and Cassin’s Sparrow into R and used the predictor raster stack to build SDMs using five different algorithms: BioClim (BC), generalized linear model (GLM), MaxEnt, boosted regression tree (BRT), and random forest (RF). We used the R package “dismo” to execute BioClim, generalized linear model, and Maxent (Hijmans et al. 2017). We used the R packages “gbm” and “randomForest” to execute boosted regression tree and random forest, respectively (Greenwell et al. 2019; Liaw and Wiener 2002). Table 2 outlines the basic mathematical approach of each modeling algorithm and provides a comparison of the advantages of each model in the occupancy framework. We generated “background data” to produce the non-presence class required by the logistic models. Background data do not attempt to guess at absence locations, but instead are used to characterize the study region (Phillips and Elith 2011; Phillips et al. 2009; Ward et al. 2009). These establish the environmental domain of the study and are independent of occurrence data while presence data establish the conditions under which a species is more likely to be present than a null, or completely random, model would predict. After building baseline SDMs for each species, we added the grazing pressure raster to the occurrence predictor raster stack and re-ran the models to assess any improvement or degradation in the predictive performance of each algorithm. Prior to building SDMs, we performed preliminary analyses for each species to ensure only predictors that added to the explanatory power of the models and did not add to the overall deviance which were used in each SDM. This included the use of a priori Gradient Boosting Machine (GBM) analyses and step-wise regression variable dropping and selection for each model and species.

Table 2 Comparison of mathematical approach for each modeling algorithm being used to project species distributions in this study. Data requirements and advantages are also listed

Model evaluation

We evaluated performance of each model using the area under the receiver operator curve (AUROC or AUC) and true sensitivity statistic (TSS). The AUC (range from 0 to 1) is a measure of rank correlation. In unbiased data, a higher AUC value indicates that areas with high predicted suitability values tend to be sites of known presence (Phillips et al. 2006). The TSS is an approach based on maximizing the sum of sensitivity and specificity independent of species prevalence (Liu et al. 2013). Many distributional model evaluation approaches (e.g., kappa) are threshold-dependent; a value above a user-set threshold indicates a prediction of presence (e.g., any outcome above a 50% [0.50] likelihood indicates presence), and a value below the threshold indicates absence. However, different models assign different weight to false absences or false presences making it hard to compare models directly. The TSS is considered an alternative to the traditionally used kappa to assess model performance, since it has the advantage of being threshold and prevalence independent. This becomes especially meaningful when building SDMs for rare or endangered species that may have low prevalence across a given range or study area as the default threshold, usually 0.5, for many models (e.g., logistic regression-based GLM) may not be appropriate. In these cases, studies have suggested the use of binary species presence/absence maps as input which may be preferred for interpretation in building conservation plans, reservation networks, or sanctuaries as opposed to a continuous representation of probability of species presence (Fernández et al. 2006; Mladenoff and He 1999; Wilson et al. 2005). Although not prevalence independent, the AUC can be valuable in determining optimal threshold criteria. For example, Freeman and Moisen (2008) found that for SDMs projecting distributions of species with high prevalence (50%), default threshold criteria tended to converge. However, for species with low prevalence (e.g., 10%), the threshold where sensitivity + specificity is maximum offered the ideal probability threshold for species presence. In the R workspace output, this is typically read as “Max TPR + TNR” and can be exceedingly valuable for accurately modeling distributions of rare or endangered species.

Results

We recorded a total of 1,565 occurrences for all three species within the CGRDA in the summer of 2016 (Northern Bobwhite = 996, Eastern Meadowlark = 179, Cassin’s Sparrow = 390). Machine learning models (MaxEnt and RF) had the highest combinations of AUC and TSS for all species, with RF being the most consistent for each analysis (Table 3). In comparison of AUC values, the environmental envelope model (BC) and the GLM remained constant or only marginally improved with the addition of the grazing pressure raster. However, the TSS for these algorithms markedly improved with the addition of the grazing pressure raster for the Northern Bobwhite (TSS = +0.93) and Eastern Meadowlark (TSS = +0.08) SDMs (Table 3). The predictive power of both machine learning models and the BRT improved with the addition of the grazing pressure raster for all species, with the exception of MaxEnt and Eastern Meadowlark [Maxent: Northern Bobwhite [AUC = +0.06], Cassin’s Sparrow [AUC = +0.02]; random forest: Northern Bobwhite [AUC = +0.01], Eastern Meadowlark [AUC = +0.05], Cassin’s Sparrow [AUC = +0.02]; random forest: Northern Bobwhite [AUC = +0.03], Eastern Meadowlark [AUC = +0.04], Cassin’s Sparrow [AUC = +0.03]. Random forest had the highest explanatory power (AUC) across all species but was, however, outperformed in sensitivity (TSS) by the other algorithms for all species for models including the grazing pressure raster (Table 3).

Table 3 Results of species distribution model (SDM) performance for Bioclim, generalized linear model (GLM), MaxEnt, boosted regression tree (BRT), and random forest (RF) algorithms in predicting occurrence of Northern Bobwhite (NOBO), Eastern Meadowlark (EAME), and Cassin’s Sparrow (CASP) on East Foundation’s Coloraditas Grazing Research and Demonstration Area (CGRDA) in the summer of 2016. Model performance metrics (area under curve [AUC] and true sensitivity statistic [TSS]) are compared for SDMs using environmental predictors only, and environmental predictors stacked with a raster representing localized grazing pressure (denoted by “+”).

Northern Bobwhite distribution, the species of highest prevalence (n = 996), was best explained by random forest model inclusive of grazing pressure (AUC = 0.84; TSS = 0.48). However, the Bobwhite distribution was better explained by the addition of the grazing pressure raster by all algorithms as evidence in the measurable increase in AUC and TSS in each model (AUC = + 0.01−0.06, TSS = + 0.04−0.93; Table 3).

Eastern Meadowlark distribution, the species of the lowest prevalence (n = 179), was also best explained by the random forest model inclusive of grazing pressure (AUC = 0.95; TSS = 0.67). The SDM explanatory power for this species’ distribution was not improved with the addition of grazing pressure using the BioClim, GLM, and MaxEnt algorithms. Cassin’s Sparrow distribution, the species of moderate prevalence (n = 390), was also best explained by the random forest model inclusive of grazing pressure (AUC = 0.81; TSS = 0.23). However, the SDM explanatory power for this species’ distribution was not improved with the addition of grazing pressure using the GLM algorithm. Additionally, other algorithms (BRT and MaxEnt) produced higher TSS values (TSS = 0.67 and 0.29, respectively).

Discussion

Our novel approach to spatially quantify localized grazing pressure improved the prediction accuracy and sensitivity of SDMs projecting the distribution of Northern Bobwhite, Eastern Meadowlark, and Cassin’s Sparrow. Of the three algorithms used, random forest performed best for explaining presence regardless of species prevalence and should be preferred by rangeland managers seeking to promote sustainable livestock grazing while balancing the needs of sensitive wildlife populations. Random forest models operate on a machine-learning, decision tree mechanism. Thus, the superior performance of RF in this study implies that it is a valuable approach to limited, binary data (e.g., presence/absence). It is important to note the varying model performance with relation to species prevalence. For example, SDMS built to project distributions of Northern Bobwhite, the species of the highest prevalence in this study varied widely in predictive performance (AUC) and sensitivity (TSS) across algorithms. Rangeland managers should consider both metrics (AUC and TSS) when assessing model performance since both provide valuable insight into the over utility of the model (i.e., AUC describing explanatory power and TSS describing model stability or sensitivity to the predictors). Although both AUC and TSS are theoretically prevalence independent, for species like Northern Bobwhite that are often locally abundant where they are present, machine-learning models that can accommodate non-linear relationships (e.g., random forest) should be preferred in modeling distributions. In an ecological context, the improvement in model sensitivity and explanatory power seen with the addition of grazing pressure to Northern Bobwhite SDMs should be considered meaningful by rangeland ecologists. The direct impacts of livestock grazing (e.g., changes in vegetative structure and composition) on the distribution of Northern Bobwhite is well recognized (Baker and Guthery 1990; Coppedge et al. 2008; Flanders et al. 2006; Lusk et al. 2002). However, with the inclusion of grazing pressure as an indirect variable and the subsequent increase in explanatory power across all algorithms (AUC = + 0.01−0.06), our findings suggest this species’ distribution is also indirectly affected by livestock grazing activities. Thus, future investigations into the Northern Bobwhite distribution or populations should consider the presence and localized intensity of livestock grazing.

The addition of grazing pressure as a variable also increased the explanatory power and sensitivity of some SDMs built to project distributions of Cassin’s Sparrow, the species of moderate prevalence in this study (BioClim, MaxEnt, BRT, RF). However, any improvements in the model performance were marginal (AUC = + 0.0−0.4). Our findings suggest indirect effects of livestock grazing on Cassin’s Sparrow presence, though marginally detectable, were negligible. Rangeland managers should consider the unique ecological circumstances of each rangeland and livestock grazing system when investigating Cassin’s Sparrow distribution or presence. Although both machine-learning models (MaxEnt and random forest) and boosted regression tree performed relatively well, compared to the envelope (BioClim) and logistic algorithms (generalized linear model), the BRT produced the highest model sensitivity. This is likely due to the innate accommodation of missing and limited data in this algorithm, which makes it ideal for species of lower (or unknown) prevalence. In these cases, the boosted regression tree provides a superior, yet conservative SDM for rangeland ecologists seeking to project distributions of species with low to moderate or unknown prevalence.

Distributions of Eastern Meadowlark, the species of the lowest prevalence in this study, were better explained by the addition of grazing pressure only in the boosted regression tree and random forest SDMs. Although previous studies have suggested a neutral effect of livestock grazing activity on the presence of Eastern Meadowlark, this species has also been known to alter behavior and be particularly susceptible to brood parasitism (usually by Brown-headed cowbird Molothrus ater) in heavily grazed pastures (Baker and Guthery 1990; Coppedge et al. 2008). Further, Roseberry and Klimstra (1970) found substantial differences in Eastern Meadowlark nest densities between lightly grazed and heavily grazed pastures of similar vegetation composition and area. While direct impacts of livestock grazing (e.g., changes in vegetative structure) may not be as evident in the distributions of this species as they are in others (e.g., Northern Bobwhite), our findings suggest some indirect influence of livestock grazing activity on Eastern Meadowlark presence. The random forest algorithm, in the accommodation of missing data and low presence values, produced the SDM with the highest explanatory power for this species, and it should be preferred for other species of low prevalence.

In a broader context, our investigation of grazing pressure influence on bird distributions could be expanded to explore the effect of other land uses (e.g., urbanization and cultivation) on distributions. For instance, the effect of native prairie versus cultivated cropland on the distribution of birds of the same species or the more complex interaction between disturbance and brood parasites (e.g., Brown-headed cowbird Molothrus ater) and that effect on the distribution of susceptible bird species (Brittingham and Temple 1983). This becomes especially relevant as climate change accelerates the impacts of land-use changes across the landscape (Pielke 2005).

BioClim

This algorithm is traditionally used as an environmental envelope method to model large scale distributions and invasions (Hijmans et al. 2001, 2005). However, recent improvements in the algorithm (in the R package “Dismo” [Hijmans et al. 2017]) have allowed analyses of single species occurrences at finer resolutions. The binary output also makes it especially well-suited for species with low prevalence. For example, it performed best (AUC = 0.81) with the Eastern Meadowlark, the species of the lowest prevalence in this study. For this species, this model did not improve with the addition of grazing pressure as a predictor. Since other models showed improvement with the addition of grazing pressure (BRT and RF), this may suggest some disadvantage to the linearity of this algorithm. BioClim also had the poorest predictive performance (AUC = 0.54; 0.58, with and without grazing pressure, respectively) for Northern Bobwhite. This species had the highest prevalence in the study and, thus, may suggest a saturation limitation for this algorithm as large sample sizes have been recognized to de-stabilize similar models (Mateo et al. 2010).

GLM (binomial)

The SDMs built using this logistic regression-based algorithm, generally, performed poorly, especially for Cassin’s Sparrow (AUC = 0.44). Additionally, GLM SDMs for Eastern Meadowlark and Cassin’s Sparrow did not improve with the addition of grazing pressure despite the improvement seen in other models. Although this algorithm can theoretically accommodate non-linear relationships between predictor and response variables, it has been recognized to over-fit distribution models producing biased or inaccurate results (Austin and Cunningham 1981; Elith and Graham 2009).

MaxEnt

SDMs built using this machine-learning algorithm projecting Northern Bobwhite, and Cassin’s Sparrow distributions improved with the addition of grazing pressure as a predictor. However, predictive power of the Eastern Meadowlark SDM decreased with the addition of grazing pressure (AUC = 0.79, 0.78; respectively) while the TSS remained high (0.61, 0.75, respectively). Although not a rare or endangered species, this was the species of the lowest prevalence in the study and supports the concept suggested by Freeman and Moisen (2008) that default probability thresholds may not be appropriate at low prevalence, and that the intersection where Sensitivity + Specificity is maximum could serve as a more ideal probability threshold for species presence. We did not perform this analysis here but is an area of interest for future research in improving SDMs.

BRT

The BRT performed best with Eastern Meadowlark SDMs (AUC = 0.89), and all species’ models improved with the addition of grazing pressure as a predictor. This algorithm has the unique advantage to accommodate collinearity among predictors and fit complex nonlinear relationships between response and predictor variables (Elith et al. 2008; Franklin 2010). Among the SDMs projecting Cassin’s Sparrow distribution, the BRT had the highest model sensitivity (TSS = 0.67). The BRT requires two user-input parameters: learning rate (lr), which determines the contribution of each decision tree to the overall model, and tree complexity (tc), which controls whether interactions are fitted (Elith et al. 2008). Ideally, parameters should be optimized based on sample size, number of predictors, intended use of the model, etc. to avoid overfitting the model. However, for the purposes of this study, we maintained consistent parameters to directly compare model performance (lr = 0.001, tc = 6). This may have contributed to the poor predictive performance of the BRT in projecting Northern Bobwhite distribution relative to the other two species.

RF

This regression-based machine-learning algorithm performed best for Eastern Meadowlark SDMs (AUC = 0.95) and produced the most powerful SDMs for all species. All models built using this algorithm improved with the addition of grazing pressure as a predictor, and model sensitivity was relatively consistent compared to the output of the other SDMs. Whereas the BRT requires the user to alter input parameters to ensure the model is not over fitted, RF has the advantage of a built-in “safe-guard” against overfitting in that each decision tree which uses a random bootstrap aggregation to subsample the given data (Breiman 2001; Prasad et al. 2006). RF is growing in popularity among ecologists for SDM and shows great promise for advanced SDM applications since it makes no assumptions on data distributions.

Conclusions

Our findings suggest that model selection for SDM should include consideration of species prevalence and machine-learning algorithms should be preferred when the target species is of low or unknown prevalence. For example, rangeland ecologists building SDMs for a species that is either rare across its range or of unknown abundance are able to select or alter the probability threshold of species presence in machine-learning algorithms. This is especially valuable since SDMs build based on the default probability threshold (0.5) used for rare or endangered species could lead to misinformed conservation plans and refuge networks. This new approach in spatially quantifying and including livestock grazing pressure as an indirect variable in SDMs has broad implications in rangeland ecology since it addresses a weakness in the current SDM framework—the exclusion of biotic and indirect relationships. With this, we can better estimate the effects of varying grazing regimes on grassland bird populations and more accurately predict the distribution of species of interest

Further, our results imply livestock grazing has indirect influence on grassland bird species’ distributions and should be included in SDMs as an indirect variable in addition to direct, associated vegetative changes. This is especially important for ground-dwelling species (e.g., Northern Bobwhite). For instance, more advanced boosting or machine-learning algorithms (e.g., boosted regression tree and random forest) that can accommodate limited data, complex and non-linear relationships, and collinearity among predictors could inform a rangeland ecologist if the redistribution, or absence of breeding quail on a property, is more heavily influenced by the absence of rainfall during drought conditions (an indirect effect) or the resulting senescence of vegetation (a direct effect of drought). Algorithms that can tease apart these effects can help inform effective, science-based management.