Introduction

There is great interest in developing approaches for translational ecology (Schlesinger 2010; Enquist et al. 2017), where research is designed to provide stakeholders with information they can use to address challenges. Because three-quarters of global food crops benefit from the pollination services of bees and other animals, growers are particularly interested in increasing bee populations in their farms (Klein et al. 2007). Abundant and diverse wild bee communities have been associated with increased yield of multiple cropping systems around the world, even in the presence of managed bees (Garibaldi et al. 2013). Thus, there has been substantial investment, from both landowners and government agencies, in land management practices to increase abundance and diversity of pollinator populations (Ansell et al. 2016). However, the spatial scales at which conservation practices are implemented are often inadequate compared to the area over which bees forage for nutritional resources in the landscape. For example, in the mid-Atlantic USA, the mean foraging range of wild bee communities is approximately 500 m (Bartomeus et al. 2013; Kammerer et al. 2016a). A landscape with radius of 500 m has an area of 79 ha, but the mean size of habitat patches in this region is < 0.17 ha (M Kammerer, unpublished analyses) and installation of pollinator habitat frequently occurs in small habitat patches (Hopwood 2008; Morandin and Kremen 2013). In addition, despite documented seasonal shifts in pollinator communities and the floral resources on which they depend for nutrition, temporally resolved information on pollinator and floral communities is typically not considered. To best inform habitat restoration and management for pollinators, it is necessary to understand how local and landscape quality, across seasons, co-influence pollinator communities.

If landscape quality is a primary driver of bee communities, and conservation practices improve only a small section of a landscape, how likely are conservation practices to benefit wild-bee populations? A quantitative comparison of local and landscape effects is necessary to decide not where, but if adding small patches of wild-bee habitat is likely to realize a measurable increase in bee abundance or richness (Gonthier et al. 2014), although it is challenging to evaluate on a larger scale how populations are affected (Kleijn et al. 2018; Scherber et al. 2019). Quantitative syntheses of agroecosystems across the globe found abundance of wild bees increased with more complex landscapes and diverse local plant communities or vegetation types (Kennedy et al. 2013; Shackelford et al. 2013). Richness of all wild bees also increased with quantity and proximity of favorable habitat present at the landscape scale, and, for solitary bees only, locally diversified fields supported more species (Kennedy et al. 2013). Comparing the effect of different scales, some studies have shown that local context matters as much (Schubert et al. 2022) or more than landscape quality (Coutinho et al. 2018; Rollin et al. 2019), but there are also examples where landscape effects dominated (Bartholomée et al. 2020; Griffin et al. 2021; Coutinho et al. 2021) or compensated for intensively managed agriculture or local contexts that are otherwise challenging for wild-bee communities (Rundlöf et al. 2008; Papanikolaou et al. 2017).

Between spring and summer, composition of wild-bee communities changes substantially, which could lead to seasonal variation in importance of local and landscape resources. In the mid-Atlantic USA, there is substantial turnover in bee species present in early season (before late-May) versus mid to late season (July–September, Kammerer 2021). Specifically, from April to June, bee communities in each month are distinct, while bee species present from July to September (termed ‘late season’ from here forward) are different from early season, but do not vary monthly (Turley et al. 2022). With changes in composition of the bee community comes seasonal variation in several important functional traits including voltinism, overwintering location, sociality and body size (Osorio-Canadas et al. 2018). Body size is strongly linked to species’ typical foraging distances (Greenleaf et al. 2007), and likely dictates the relative importance of local and landscape resources. However, to our knowledge, no studies have examined seasonal variation in dependence of bees on local vs. landscape resources. Moreover, habitats vary widely in the magnitude and timing of floral and nesting resources that they provide for bees (Ogilvie and Forrest 2017), which can lead to temporally variable relationships between wild-bee communities and land use (Cole et al. 2017; Galpern et al. 2021).

To facilitate translating results of landscape-scale studies to applied management and conservation, pollination ecologists should quantify landscape quality based on specific, seasonal resources and risks. Most studies have described landscape quality using broad landscape metrics, such as percent semi-natural habitat (Ricketts et al. 2008; Kennedy et al. 2013) (but see Guezen and Forrest 2021; Smart et al. 2021; Bloom et al. 2021; Eckerter et al. 2022). However, these metrics are less useful for applied conservation decisions, as land use is an indirect, rather than direct, driver of bee abundance and diversity (Roulston & Goodell 2011). Land-use patterns influence floral and nesting availability (Williams and Kremen 2007), pesticide risk, and disturbance regimes (mowing, tillage, logging), so representing landscape quality based solely on amount of semi-natural habitat cannot untangle relative importance of multiple drivers. Furthermore, different semi-natural habitat types vary in how many food resources they provide (Bartual et al. 2019), and documenting bee responses to broad land-use patterns precludes determining which habitats are most important for bees and how and when to offset resource scarcity.

To address these knowledge gaps and inform conservation practices for wild bees, we studied season-specific relationships, over a two-year period, between local and landscape quality and wild-bee communities at 33 sites in the Finger Lakes region of New York, USA. We paired field surveys of wild bees, plants, and soil characteristics at each site with a multi-dimensional assessment of landscape quality, including data generated from a novel, spatio-temporal evaluation of floral resources at the local and landscape scales (Iverson et al., unpublished data). We ask the following research questions: 1) What is the relative importance of local versus landscape scale in driving bee richness and abundance? 2) Which metrics of local and landscape quality best explain wild-bee abundance and richness? And 3) How do the relationships in (1) and (2) vary with season?

Materials and methods

Study region and site selection

We studied wild-bee communities at 33 sites in the Finger Lakes region in Central New York State, USA (Fig. 1). The Finger Lakes region is composed of approximately 42% semi-natural land (almost all forest habitats), 8% developed, and 49% agriculture, including pastureland (USDA NASS 2018). There is a regional gradient of landscape composition, with high forest cover in the south, and increasing agricultural land moving north and closer to the lakes. (Fig. 1). In proximity to Seneca and Cayuga lakes, the climate is relatively moderate, particularly well-suited for specialty crops like wine grapes and tree fruits. Generally, in this region, crop diversity is quite high, especially on small farms, which are common (USDA NASS 2019). Twelve percent and 42% of farms are smaller than 4 ha and 20 ha, respectively (USDA NASS 2019). The small farm sizes and relatively high amounts of semi-natural habitat means many agricultural areas in our study region could still be considered ‘complex’ landscapes (Tscharntke et al. 2005). For example, within 1 km, only five of our 33 study sites had less than 20% semi-natural habitat, and none were below 1% semi-natural habitat (a ‘cleared’ landscape as defined by Tscharntke et al. (2005).

Fig. 1
figure 1

Map of study sites in the Finger Lakes region of New York, USA. Bees were sampled in spring and summer 2018- 2019 at 33 sites in seven habitats defined by Iverson et al. (unpublished data; mesic upland remnant forests, floodplain forests, forest edges, old fields, roadside ditches, mixed vegetable farms, and apple orchards)

Stratified within seven habitat types, we selected 33 study sites from 144 locations included in a previous study documenting plant community composition, local and landscape floral resources (Iverson et al. unpublished data, see ‘Plant species richness’ methods below for overview of Iverson et al. methodology). We selected sites that belonged to seven habitat types that span a range of semi-natural to managed land use: mesic upland remnant forest (i.e., not previously cleared for crops), floodplain forest, forest edge, old field, roadside ditch, mixed vegetable farm, and apple orchard. Forest edge plots were located adjacent to the boundary of upland forest and a neighboring low-vegetation habitat (usually cropland), old fields were post-agricultural areas dominated by grasses and goldenrod (Solidago spp.; Euthamia graminifolia), and mixed vegetable farms were relatively small-scale farms that grew a diversity of fruits and vegetables. We selected the 33 sites from the initial 144 locations in Iverson et al. (unpublished data) based on attaining four to five sites of each of the seven selected habitat types and on maintaining a minimum distance between sites. To ensure we were sampling independent bee communities, we chose sites that were, at minimum, one km from all other sites which, in our study region, exceeds the mean foraging range of a typical community of wild bees (Kammerer et al. 2016a).

Wild-bee survey

At each of our 33 sites, in 2018 and 2019, we measured bee species richness and abundance using bee bowls according to the protocol of a long-term bee monitoring program in our region (Droege et al. 2016). In each year, we sampled early-season bees in late April/early May and late-season bees in mid-July; dates were selected to correspond to peak floral abundance in forest, wetland, and successional habitats (Iverson et al. unpublished data). Despite monthly turnover in early-season bees (Turley et al. 2022), due to logistical constraints, we were limited to one sampling round for early-season bees. To partially compensate for fewer sampling rounds, in early season, we deployed bee bowls for 14 d rather than more commonly used sampling periods that are shorter, e.g., one or seven days (Kammerer et al. 2016b; Droege et al. 2016). In July, trap liquid evaporated more quickly, which limited our sampling to 7 d. We filled fluorescent blue, fluorescent yellow, and white, 355 mL Solo polystyrene plastic cups with 50:50 mix of propylene glycol and water. We placed bee bowls at the height of dominant vegetation. At each site, we arranged bee bowls in 80-m transects in visible areas, alternating bowl color with 10 m between each bowl, for a total of 9 bowls per site. We placed the middle of our transects at the center of Iverson et al.’s (unpublished data) plant survey plots, based on the recorded GPS coordinate, or, at some locations, physical flags marking the plot outline.

After collection, we stored bee specimens in 70% ethyl alcohol solution until pinning and sorting. We washed, pinned, and identified bee specimens, to species when possible. Dr. David Biddinger (Penn State Center for Pollinator Research, Biglerville, PA) and Dr. Rob Jean (Environmental Solutions & Innovations, Inc., Indianapolis, IN) identified our specimens using published dichotomous keys (Mitchell 1960, 1962; Michener et al. 1994; Michener 2007) and interactive guides to bee identification available at Discover Life (Ascher and Pickering 2013). We sorted specimens in the Nomada “bidentate” group to morphospecies, as this group is poorly resolved in existing taxonomic keys (Droege et al. 2010; Ascher and Pickering 2013). Also, some Lasioglossum specimens were damaged during collection or processing (n = 50), so we could not reliably determine species identity. We excluded these specimens from richness analyses that required species-level identification. Excepting Apis mellifera specimens, which we assumed were from managed colonies, we deposited all bee specimens collected for this study in the Frost Entomological Museum at The Pennsylvania State University in University Park, PA. We also provided the Frost Entomological Museum with a digital copy of metadata on all specimen labels (date, location, method of collection, species determination, etc.).

Metrics of local habitat quality

For each site, we measured soil characteristics, as many species of wild bees nest in the soil (Harmon-Threatt 2020) and because soil fertility influences floral abundance, quality and quantity of floral rewards, and resulting bee visitation (Carvalheiro et al. 2021). We also calculated plant species richness, community composition, and floral area from existing data (Iverson et al. unpublished data).

Soil collection and processing

In May 2018, we collected soil from each of our 33 study sites. Along the bee sampling transect, we collected five soil samples with a bucket auger to a depth of 9–18 cm, depending on rock and moisture content. We took shallower soil cores (9-12 cm) at sites with very rocky or wet (floodplain habitat) subsoil. Wild-bee nesting would likely be inhibited by very high rock content or completely saturated subsoils (Harmon-Threatt 2020), so we considered the shallower sampling depth representative of the most favorable zone for bee nests. To quantify bulk density, at two locations along the bee transect, we collected three undisturbed soil cores (0–3 cm, 4–6 cm, and 7-9 cm deep) with a slide hammer sampler (Soilmoisture Equipment Corp, Goleta, CA). Due to high moisture or rock content at some sites, we were only able to collect two bulk density cores, but we recorded the number of cores in each sample. For processing and analysis, we combined bulk density cores from all depths.

After collection, for all soil samples, we measured wet mass, then dried samples in an oven at 60 °C for five days (or until the mass did not decrease) and measured dry mass. We calculated bulk density from dry mass and sent bucket-auger samples to the Penn State Agricultural Analytical Services Laboratory, where they measured pH, P, K, Mg, Ca, Zn, Cu, S, total nitrogen by combustion, percent organic matter, and percentage sand, silt, and clay via standard laboratory methods (Penn State Agricultural Analytical Services Lab n.d.). To summarize trends in soil characteristics, we centered and standardized soil variables and conducted a principal components analysis with the stats package in R (R Core Team 2021).

Plant species richness, abundance, community composition, and floral area

We designed our study to leverage an existing, comprehensive plant survey conducted in the greater Ithaca region, NY (Iverson et al. unpublished data). In 2016 and 2017, Iverson et al. documented plant species richness and abundance at 144 sites across the most common habitat types (N = 22) in the surveyed region that span across the broader classes of forest, agriculture, wetland, successional, and developed. Briefly, Iverson et al. surveyed 3–20 sites of each habitat type based on variability among sites, with a minimum of 5 and mean of 7.5 sites for each of the seven habitat types included in this study. They used a Modified Whittaker plot design (Stohlgren et al. 1995) with halved dimensions, equating to a 10 × 25 m plot with nested subplots of varying sizes. They recorded plant cover by species in ten 0.25 m2 quadrats and species presence in all other subplots and in the full plot to estimate overall plant coverage. Furthermore, they recorded the species abundance of all mature, i.e., potentially flowering, angiosperm trees.

Using a combination of survey data and flower density and size measurements for each species observed, Iverson et al. (unpublished data) estimated floral area (FA) per species in each site. To estimate flower density, for each species, they measured the number of flowers in a 0.25 m2 quadrat during peak bloom and measured flower size in the field, from herbaria specimens, or from published sizes in online plant databases. Then, utilizing observed bloom dates or dates published in a flora specific to the region, they estimated FA across the bloom window of each plant species. Next, for each site, Iverson et al. calculated daily FA by summing FA, weighted by percent cover, of each of the flowering plant species present (excluding apple blossoms in ‘apple orchard’ habitat, in order to capture only orchard-floor floral resources). Finally, they calculated the daily average FA per habitat type by averaging FA of each site. All data on floral area are presented as floral area (m2) per hectare of sampled habitat.

For our analysis, we summarized site-level FA curves with seven metrics: season total FA, minimum FA, maximum FA, FA coefficient of variation, and total FA in spring, summer, and fall. For total, maximum, and coefficient of variation, we summarized FA from April to mid-November. We quantified minimum FA from a narrower timeframe associated with more rapid plant growth (mid-May to mid-September) to avoid all sites receiving similarly low values associated with the beginning or end of the season. We defined ‘spring’ as early April to mid-June (day-of-year 92 to 163), ‘summer’ as mid-June to late August (day-of-year 164 to 238) and ‘fall’ as late August to mid-November (day-of-year 239 to 310). We generated two versions of each FA metric, one that represents the FA of all plants, and the second that only includes plant species that are known to be pollinated by insects (‘IP plants’, Iverson et al. unpublished data). IP plants will generally provide both nectar and pollen resources, as opposed to primarily pollen resources provided by wind-pollinated species.

From the plant survey data, we also quantified plant species richness, relative abundance, and community composition for our 33 study sites. We calculated species richness of all plants, species richness of insect-pollinated plants, and percent cover of all vegetation. Iverson et al. (unpublished data) used non-metric multi-dimensional scaling (NMDS) ordination to quantify two main gradients in plant-community composition corresponding to variation in management intensity (unmanaged forest and wetland habitats to heavily managed agricultural land) and moisture availability. To determine the effect of plant communities associated with varying management intensity and moisture content on bee abundance and richness, for our 33 study sites, we included NMDS axis loadings from Iverson et al. as predictor variables in our analyses.

For all metrics of plant composition and flora area, we utilized data from plant surveys conducted two years before our bee sampling (2016–2017 for plant surveys and 2018–2019 for bee surveys). In semi-natural to natural habitats included in our study (mesic upland remnant forest, floodplain forest, forest edge, and old field), we expected little change in plant communities in two years between plant and bee sampling because these habitats are not subject to frequent human intervention (mowed, sprayed, or tilled). For the managed habitats (roadside ditches, mixed vegetable farms, and apple orchards), the gap between year of plant and bee sampling could have contributed to error in our analyses. However, rather than annual counts of flowers, Iverson et al. (unpublished data) approach of surveying the full plant community at each site, yet modelling floral area based on regional mean flower density and phenology, depicts mean flowering over time while emphasizing differences between sites and habitat types. We judged this to be a good fit for datasets like ours where plants and bees were sampled in different years. From our field observations, none of our sites experienced major disturbances like fire or pest outbreak.

Metrics of landscape quality

Landscape composition and configuration

To describe the land cover surrounding our study sites, we used a high-resolution map of land cover available for our study region (Li et al. 2024). This product utilized a regional 1 m resolution dataset of land cover (Chesapeake Conservancy 2013) to differentiate impervious surface, trees, and low vegetation and a regional natural habitat map (Ferree and Anderson 2013) and the USDA Cropland Data Layer (USDA NASS 2018) to resolve more detailed natural and agricultural habitats, respectively. Wetland habitats were incorporated using the National Wetlands Inventory data (U.S. Fish and Wildlife Service 2017) and a ‘roadside ditch’ habitat class was created by adding a 3 m buffer on either side of non-urban roads.

From this high-resolution land-cover map, we calculated landscape composition and configuration within 1 km radius of each of our sampling sites. We selected this distance because, excluding large-bodied Bombus sp., most wild-bee species in our region forage within 1 km of their nests (Kammerer et al. 2016a, b). We grouped land-cover classes to form six metrics of landscape composition: percentage of the landscape in agriculture, forest, successional (old field and shrubland), wetlands, water, and developed habitats (Table S1).

Based on previous research examining landscape configuration effects on wild bees (Kennedy et al. 2013), we also calculated six landscape configuration metrics to represent the aggregation, shape, and diversity of habitat patches around our sampling sites. Specifically, using the landscapemetrics package, version 1.4.2, in R (Hesselbarth et al. 2019; R Core Team 2021), we calculated edge density, Shannon diversity of land cover classes, Simpson diversity of land cover classes, mean perimeter-area ratio, variation in nearest neighbor distance between patches of the same class, and interspersion and juxtaposition indices. We predicted higher edge density and mean perimeter-area ratio would be correlated with more diverse pollinator communities because, compared with core forest or adjacent agricultural land, in mid-Atlantic USA, forest edges and hedgerows often have more diverse plant communities with higher floral abundance (Kammerer et al. 2016b; Iverson et al. unpublished data). We expected that landscapes with high diversity of land cover classes would have higher richness of bees because diverse landscapes are more likely to contain habitat specialists and rare species (Harrison et al. 2019).

Landscape insecticide toxic load

We included insecticide risk to wild bees in our quantification of landscape quality as myriad evidence shows insecticide exposure can negatively influence bee behavior and reproduction and that insecticides used locally can have far-reaching consequences (Goulson et al. 2015; Long and Krupke 2016). To represent risk to wild bees from insecticide applied to agricultural crops, we calculated an index of insecticide toxic load. We generated the index of insecticide toxic load (Douglas et al. 2022) using our high-resolution land cover maps and insecticide data from 2014, as this is the most recent year with a complete, publicly available dataset of insecticide application in U.S. agricultural crops. We estimated insecticide risk at our sampling sites in three steps. First, we calculated a distance-weighted metric of landscape composition from the high-resolution land cover data (see Landscape composition and configuration) with the ‘distweight_lulc’ function in the beecoSp R package (Kammerer and Douglas 2021). For distance-weighting, we selected a wild-bee foraging range of 500 m, which assumes, from the center of each study site, 70% and 100% of bee foraging occurs within 500-m and 1-km radii, respectively. Then, for each land-cover class, we multiplied distance-weighted area by the insecticide toxic-load coefficient for a given land cover. Finally, for each study site, we calculated total insecticide toxic load as the sum of insecticide load from all land cover classes within a 1-km landscape. Without a priori knowledge of the most likely route of insecticide exposure for each active ingredient, we used insecticide values corresponding to the mean of oral and contact toxicity (Douglas et al. 2022).

Floral area of landscapes

We quantified the floral area of landscapes in three steps. First, we averaged floral area at all sites within a land-cover class to calculate habitat-level FA per day per ha of habitat. Then, for each land-cover class within 1 km of our study sites, we multiplied distance-weighted area (ha) in the landscape (see Insecticide toxic load for distance-weighting details) by the habitat-level FA. Then, we summed over all land-cover classes, yielding landscape-total FA per day for each of our study sites. For local and landscape floral area, we calculated minimum FA, maximum FA, FA coefficient of variation, and total FA in spring, summer, and fall for all plants and for insect-pollinated plant species only (‘IP plants’, Iverson et al. unpublished data).

Distance to water and topography

We generated distance to water and topography metrics and included these factors in our analyses. To differentiate riparian sites very close to water, we calculated the minimum distance from our sampling locations to water (streams, rivers, ponds, or lakes). We identified water features closest to our sampling sites using the National Hydrography Dataset (U.S. Geological Survey 2019a) for New York State. Lastly, we included topographic information to represent the micro-climate at each of our sites. We calculated elevation, slope, and aspect at each sampling site from the 1/3 arc-second (approximately 10 m) resolution USGS National Elevation Dataset (U.S. Geological Survey 2019b). We included distance to water and topography as landscape variables because, excepting elevation, they describe the relationship between the sampling site and surrounding water and topographic features. However, distance to water and topography were not among the most explanatory predictor variables (Fig. 2), so if we had classified them as local variables, our results would be largely unchanged. We used the R statistical and computing language (R Core Team 2021), specifically the sf and raster packages (Pebesma 2018; Hijmans 2022), for all manipulations and analyses of geospatial data.

Fig. 2
figure 2

Permutation variable-importance scores from random forest models predicting wild-bee abundance and richness. To simplify interpretation, here we only presented variable importance of variables that were among the top five predictors in at least one random forest model (n = 21 out of our total 66 predictors). The absolute value of variable importance is not very meaningful because scores are relative to the variables included and mean squared error of each model, so we elected to present values scaled to a maximum of 100 (Kuhn 2019). Bees were sampled in 2018 and 2019 at 33 sites in the Finger Lakes region of New York, USA. We calculated landscape variables for area within 1 km of our study sites. For the most important variables, symbols denote directionality of the relationship (‘ +  + ’ = moderate/highly positive, ‘ + ’ = slightly positive, ‘-’ = slightly negative, and ‘-– ‘ = moderate/highly negative). We determined directionality of effect from accumulated local effects (ALE) plots (Figs. 3, 4 and 5). Variable abbreviations are as follows: ‘CV floral area’ = coefficient of variation of floral area, ‘Plant composition, mmt intensity’ = composition of plant community associated with a gradient in management intensity (Iverson et al., unpublished data)

Statistical analyses

Sampling-effort adjustment

Prior to analysis, we adjusted bee abundance and species-richness measures to account for varying sampling effort. While we were collecting bee bowls, we recorded the number of traps that were cracked, tipped over, or otherwise compromised. We present abundance results as bees per successful trap per day, to adjust for varying sampling time and lower sampling effort at sites where traps cracked or fell over. Unfortunately, for the July 2019 sampling round, we lost our record of the number of compromised traps. To the best of our ability, we recreated these data from memory shortly after the sampling period by estimating the number of compromised traps per site. To account for differing sampling time for early and late season (14 and 7 d, respectively), we analyzed abundance as number of individuals per trap per day. But, comparing early and late season, we still found a surprising difference in mean abundance per day, which may be due to differing sampling time. For wild-bee richness analyses, we adjusted for uneven effort using coverage-based rarefaction and extrapolation (Chao and Jost 2012), rather than the number of successful traps, so recreated data were not used in richness analyses. Specifically, we estimated bee richness at the mean coverage level for each season using the iNEXT package in R (Hsieh et al. 2016, 2018; R Core Team 2021).

Random forest models

We used random forest models to compare the relative importance of local vs. landscape variables, identify the most important individual predictor variables, and examine relationships between predictors and wild bee abundance and richness. Random forest models are robust to correlated predictors and can represent complex, non-linear relationships, which we expected in our dataset (Wright and Ziegler 2017).

In our study region, the composition of wild-bee communities in early season is substantially different from mid to late season (Kammerer 2021; Turley et al. 2022), so we analyzed our early and late sampling periods separately. For early-season analyses, we removed floral-area metrics that represent floral resources available in summer and fall. Specifically, for early-season analyses, we removed summer total floral area and fall total floral area of all plants and insect-pollinated (‘IP’) plants. In most habitats, for insect-pollinated plants, peak floral area was in mid to late summer (Figure S1), so for early-season analyses, we also excluded maximum floral area (IP plants). Conversely, for late-season analyses, we excluded spring total floral area, but retained summer total and fall total floral area. Because the activity period of many bee species spans from July to September (Turley et al. 2022), it is possible for fall flowers in a previous year to influence bees sampled in mid-summer. The timing of maximum floral area (all plants), and minimum floral area (all plants and IP) was not consistent across our seven habitat types (Fig. 6, Figure S1), so we retained these variables in analyses for early and late-season bees. For all excluded floral metrics, we removed variables representing local and landscape scales, resulting in a final set of 56 predictors for early-season and 66 for late-season (Table 1).

Table 1 List of all variables included in our study including spatial group, units of measure, references, and relevant notes

Without excluding redundant variables in advance, random forest models generally perform well with a large number of predictor variables (Humphries et al. 2018), in part, because variables are resampled so that no individual tree includes all predictor variables (Breiman 2001). Based on the size of our dataset and the number of variables we selected at each tree split (mtry parameter), each decision tree in our random forests included mean of 8 and maximum of 18 variables. We assessed the performance of all random forest models using tenfold cross validation repeated 10 times, and, with a grid search, selected the optimum number of trees (1000 to 5000, incremented by 1000) and variables at each tree split (4 to 18, incremented by 2). While random forests are not generally prone to overfitting (Humphries et al. 2018), we checked if our models were overfit by examining error of predictions in training data, as very low error predicting training data can indicate model is overfit (Hastie et al. 2017). We selected root mean squared error (RMSE) as our primary metric to assess performance, which is common for applications of machine learning to regression problems (Hastie et al. 2017). We found that, even for training data, RMSE of our models was very high (across training cross-validation folds, median RMSE was 45–64% of mean bee abundance or richness), indicating little evidence of overfitting, but poor predictive ability. We used the random forest algorithm from the ranger package in R (Wright and Ziegler 2017), tuned the model with the caret package (Kuhn 2008, 2019), and, to examine our results (Figs. 3, 4 and 5), generated accumulated local effects (ALE) plots with the iml package (Molnar et al. 2018). ALE plots are generated from model predictions to depict the relative effect of changing one predictor variable and are centered at zero so each value on the curve is the difference from the mean prediction of the random forest model. For each season, we present local effect plots for year (Fig. 5) and the top four variables predicting bee abundance and richness (Figs. 3, 4 and 5). We utilized a permutation approach via ranger package (Wright and Ziegler 2017) to calculate variable importance scores (Fig. 2).

Fig. 3
figure 3

Relationship between landscape and local predictors and abundance (A-D) or richness (E–H) of wild bees in early season. For both abundance and richness models, we show the top four predictors (excluding year, shown in Fig. 5), with x-axis truncated to 10–90% quantiles (n = 29 sites). The tick marks on the X axis represent values of the predictor variable at each sampling location. ‘IP’ indicates insect-pollinated plants. To enable comparing relative effects across seasons with varying mean abundance or richness (Table 2), we depict values on the y-axis as difference from the predicted mean

Fig. 4
figure 4

Relationship between landscape and local predictors and abundance (A-D) or richness (E–H) of wild bees in late season. For both abundance and richness models, we show the top four predictors (excluding year, shown in Fig. 5), with x-axis truncated to 10–90% quantiles (n = 29 sites). The tick marks on the X axis represent values of the predictor variable at each sampling location. To enable comparing relative effects across seasons with varying mean abundance or richness (Table 2), we depict values on the y-axis as difference from the predicted mean

Fig. 5
figure 5

Difference between 2018 and 2019 for abundance (top) and richness (bottom) of wild bees in early and late-season estimated using a random forest model. To enable comparing relative effects across seasons with varying mean abundance or richness (Table 2), we depict values on the y-axis as difference from the predicted mean

Results

Wild-bee communities

In early and late season, we documented diverse wild-bee communities. Our survey yielded 3108 specimens, 1666 in 2018 and 1442 in 2019. We collected more bees in the early season (n = 2130) compared with late season (n = 987), particularly in the second year when our late sampling generated only 282 individuals. We documented 127 species of wild bees of 21 genera, with 94 early species and 78 late species. Andrena, Lasioglossum, and Ceratina were the most abundant early genera, representing 78% of the individuals we observed (Figure S2). At the species-level, the most abundant early bees (> 5% early-season abundance) were Andrena carlini Cockerell, Ceratina calcarata Robertson, Osmia cornifrons Radoszkowski, Ceratina dupla Say, and Andrena hippotes Robertson. In late-season, Lasioglossum was the most abundant genus, followed by several other genera in the family Halictidae (Agapostemon, Augochlora, and Halictus). Lasioglossum leucozonium Schrank, Agapostemon virescens Fabricius, Lasioglossum versatum Robertson, Peponapis pruinosa Say, and Augochlora pura Say were the most abundant late species (> 5% of late-season abundance).

Local habitat quality

Characterizing local habitat quality

We characterized several dimensions of local quality for wild bees, including floral area (Fig. 6, Figure S1), plant richness and community composition (Figure S3), and soil characteristics (Figure S4). Of the seven habitats included in our study (mesic upland remnant forest, floodplain forest, forest edge, old field, roadside ditch, mixed vegetable farm, and apple orchard), mesic upland remnant forest and forest edge had highest floral area, peaking in early spring (Fig. 6). In summer and fall, floral resources in all habitats were generally much lower, excepting an early-fall peak in flowers in old fields associated with goldenrod (Solidago spp. and Euthamia graminifolia) bloom period (Fig. 6, Figure S1).

Fig. 6
figure 6

Floral area over time available in seven habitats in the Finger Lakes region of New York, USA. Floral area values represent all flowering plants. Floral area of the labelled habitat is represented with a colored polygon, while all other habitats are indicated with the grey polygons

We found significant variation in soil characteristics between the seven habitat types we sampled. There were two main gradients in our soil dataset revealed by the principal components analysis (Figure S4). Explaining 26.4% of the variation in our soil data, the first principal component was associated with pH, soil texture (percent sand, silt, and clay), water content, organic matter, and total nitrogen. Roadside ditches had more basic, sandier soil than any of the other habitat types, except some floodplain forest samples. All other habitat types had loam to silt-loam soil. Interestingly, soil texture was highly variable between floodplain forests, with soil texture in this habitat encompassing the full range of texture classes represented at all other sites. The second principal component explained 20.0% of variation in soil characteristics and correlated with soil potassium, copper, phosphorus, zinc, clay content, and cation exchange capacity. Specifically, some vegetable farm and orchard samples had much higher potassium, phosphorus, copper, and zinc content than the other habitats, likely due to fertilizer or manure application to support crop growth. Soils at forested sites were characterized by higher organic matter and total nitrogen, possibly from leaf litter accumulation.

Local quality effects on wild bees

Comparing early and late season, we documented substantially different relationships between local quality and bee abundance (Fig. 2). Early in the season, we observed moderate effects of local characteristics on wild bees, with three local variables (soil potassium, soil phosphorus, and management intensity as reflected in plant community composition) among the top predictors of bee abundance. Abundance of early bees increased with higher levels of soil potassium and lower soil phosphorus measured at the site, with the greatest increase between approximately 75 and 125 ppm potassium (Figure S5 B-C). Abundance of early bees was lower in more-managed sites (Figure S5 A, Figure S6 A).

For late-season bees, abundance was influenced by the level of management (plant community composition), presence of fall-flowering plants, and percent cover of vegetation (Fig. 2). At the less-managed sites (remnant forests and floodplain forests, NMDS axis one loading less than approximately -0.1), we observed lower abundance of late bees (Fig. 4A). Bee abundance was intermediate in forest edge, old field, and ditch habitats (NMDS1 = -0.1 to 0.4), and highest in orchards and mixed vegetable farms (NMDS1 > 0.4). Sites with very low or no fall-flowering plants also had notably lower abundance of late bees, but, when floral area was greater than approximately 10,000 m2/ha of habitat, bee abundance did not increase with additional fall flowers (Fig. 4B).

For richness of wild bees, local quality had no or weak effects on bees in both seasons (Fig. 2). In the early season, none of our metrics of local habitat quality were strong predictors of bee richness. In late season, soil fertility, bulk density, and water content were the most important local predictors. We documented very slightly more species of bees at sites with intermediate soil bulk density, organic matter, and total soil nitrogen, peaking at approximately 1.18 g/cm3, 5.5%, and 0.33%, respectively (Fig. 4F-H).

Landscape quality

Characterizing landscape quality

We noted substantial variation in the composition and configuration of landscapes surrounding our study sites (1 km radius), while topography was more consistent (Figure S7, Figure S8, Figure S9). In our study landscapes, the amount of developed land ranged from zero to approximately 50%, while forest and natural habitats ranged from zero to more than 70%. All landscapes surrounding our study sites were at least 18% agriculture, up to a maximum of 91% agricultural land. Distance to the nearest water source was generally low, with a maximum of 580 m. Most of our study sites had relatively low insecticide toxic load values, but insecticide toxicity was substantially higher in landscapes with significant apple and grape production (Figure S7). Edge density was the most variable configuration metric, while the interspersion and juxtaposition index and perimeter-area ratio were more consistent (Figure S9). Both Shannon and Simpson diversity indices were left-skewed, capturing the diverse agricultural, natural, and developed land cover types present in our study region (Figure S9, Figure S7).

Landscape quality effects on wild bees

Early in the season, we documented higher abundance of wild bees at locations with more wetland, surface water, natural habitat, and flowers in the surrounding landscape (Fig. 3A-D). In our study region, wetlands and surface water are a relatively small percentage of most landscapes (3.42% ± 3.54%, mean ± standard deviation of wetland area), but we found substantially higher abundance of early bees at sites with at least 4–5% wetland in the landscape compared with those that had 0–2% wetland (Fig. 3A). Between approximately 20 and 60% natural land in the landscape, abundance of early bees increased, likely due to additional floral resources and nesting sites (Fig. 3C). Lastly, we observed higher abundance of early bees with more variable (higher CV) landscape-level floral area (FA, Fig. 3D). In our study region, high variability in FA comes from multiple, high peaks in spring FA from flowering trees in forested habitats (Fig. 6).

For species richness of early bees, the distance to water, edge density, elevation, and amount of water and developed land in the landscape were among the best predictors. At sites less than 100 m from water, we observed approximately two more species of early bees than sites that were 400 m or more from the nearest stream, lake, river, or pond (Fig. 3H).We found more species at low elevation sites and locations with some water and developed land in the landscape, although most of our study landscapes had a relatively small proportion of open water and developed land (less than 1% and 10%, respectively) (Fig. 3). Density of patch edges had a very slight, positive effect on richness of early bees.

Compared with early-season, in late-season, landscape effects on wild bees were moderated. We did not detect any relationship between landscape quality and abundance of late bees, but percent agriculture was the most important predictor of richness of late bees (Fig. 2). We observed a threshold of approximately 55% of the landscape in agriculture, above which richness of late bees dropped (Fig. 4E). Insecticide toxic load, slope, and landscape configuration metrics were not among the most important predictors of bee communities in either season.

Relative importance of year, local, and landscape quality for wild bees

Generally, with our 66 local and landscape-quality metrics, we were able to explain a substantial amount of the variation in wild-bee abundance and richness. Modeling abundance of bees in early and late seasons, we explained 47% and 34%, respectively of the variation in our data using year, local, and landscape variables (Table 2). We had more unexplained variation for species richness, with mean r-squared values of 18% and 32% for the best random forest models in early and late season, respectively (Table 2). Across all models, however, our prediction error was very high (RMSE equal to 69%-87% of mean abundance or richness, Table 2), indicating we are unable to reliably predict bee communities in additional locations or years. As a result, we focused on presenting relationships among landscape and local quality and bee abundance and richness, rather than prediction.

Table 2 Performance of random forest models predicting wild-bee abundance and richness at 33 sites in the Finger Lakes region of New York, USA. We show mean abundance or richness, root mean-squared error (RMSE), r-squared (variance explained), and mean absolute error (MAE) as mean ± standard deviation from tenfold cross validation, repeated 10 times. We tuned the number of variables at each tree split (mtry) and the number of trees (ntrees) with a grid search and present optimal values for each random forest model

Comparing landscape, year, and local predictors, we found generally that landscape variables and year explained more of the variation in early bee communities, while, in the late season, local variables and year were most important. However, the importance of local versus landscape scale was different for abundance and richness models (Fig. 2). For abundance of early bees and richness of late bees, some landscape and local variables were the most important predictors (Fig. 2). In spring, year was, by far, the most important predictor of species richness, with more species of early bees in 2019 than 2018 (Fig. 5). In addition to year, for richness of early bees, we observed weak effects of landscape composition (Fig. 2). Year was also the most important predictor of abundance of late bees, but, unlike spring richness, local quality was also important (Fig. 2).

Discussion

In this study, we found that bee abundance and richness responded to an interplay of landscape quality, local habitat quality, and year with significant differences between bee communities in early vs. mid to late season. Though we predicted early bees would be more dependent on local quality than late bees, we instead found that abundance of early bees and richness of bees in both seasons were associated with landscape quality, specifically landscape floral resources in the springtime and percentage of landscapes occupied by wetlands, water, natural, and agricultural habitats. Abundance of late-season bees was significantly related to local conditions, including soil water content, cover of local vegetation, plant-community composition, and local-scale floral resources in the fall. Moreover, late bees were more abundant in local agricultural habitats, but late bee species richness declined dramatically in sites embedded in a highly agricultural landscape context (> 55% agriculture). Here, we discuss responses of early and late season bees to local and landscape characteristics, with recommendations for designing conservation practices and suggestions for future research.

At the local scale, early in the season, bee communities were reduced at sites with highly managed plant communities and displayed complex interactions with soil nutrient levels. Surprisingly, we found little influence of local floral availability or soil texture on early bee communities. These results suggest that, in our study region, early bees are not primarily influenced by suitability of soil for nesting at each site, despite the fact that nearly three-quarters of wild bee species are ground-nesting (Pinilla-Gallego et al. 2018; Harmon-Threatt 2020). At sites with sufficient nest sites in the larger landscape, local soil characteristics (texture and water content) might be relatively unimportant, though quantifying the ecological value of soil for bee species is difficult due to the low resolution of current soil maps and our limited understanding of the needs of ground-nesting bees (Cambardella et al. 1994; Moral et al. 2010; Harmon-Threatt 2020). The mixed effects of soil potassium and phosphorus on spring bee abundance is likely due to nutrient impacts on plants, rather than direct effects on the bees themselves. While soil fertility can influence plant community composition and richness (Tilman 1987; DiTommaso and Aarssen 1989; Wilson and Tilman 1991), in our data we did not find these correlations (Pearson’s r = -0.6, -0.52, and 0.4 between NMDS axis 1 and soil organic matter, total nitrogen, and potassium, respectively). Thus, soil nutrients may instead be influencing plant flowering, floral resource production, and pollinator visitation, as observed in other studies (Burkle and Irwin 2009, 2010; Cardoza et al. 2012; Vaudo et al. 2022). The flowering plants that spring bee communities largely depend on – spring flowering trees – are likely less abundant at managed sites, which may explain the negative impact of management on spring bee communities.

At the landscape scale, spring bee communities were positively associated with the presence of wetlands and natural habitat and high variation in floral area, which is indicative of the presence of spring flowering trees. Indeed, we found that spring bee abundance was positively associated with floral resources provided by the entire plant community (wind and insect pollinated plants). It is well-known that spring bees obtain the majority of their nutrition from flowering trees (Urban-Mead et al. 2021), including wind-pollinated trees (Kraemer and Favi 2010; Splitt et al. 2021; Urban-Mead et al. 2023). Our results suggest short periods of mass flowering provided by a diversity of tree species may be equally beneficial as consistently high levels of flowering for supporting spring bee communities (Hemberger et al. 2020). In our study region, natural habitats were primarily forests or forest edge, and thus would include flowering trees, which could explain the positive association with natural areas in the landscape. Future efforts at pollinator habitat design and restoration in urban, agricultural, and natural landscapes thus should include targeted efforts to increase the diversity and abundance of flowering trees.

The landscape scale association of wetlands and surface water with spring bee communities is somewhat surprising but provides an interesting and relatively unexplored target for conservation practices. In our study region, wetland habitats, especially shrub and emergent wetlands, host a unique community of plants, including some herbaceous flowering plants not found in other habitats (Iverson et al. unpublished data). Several of the plant species associated with floodplain and emergent wetland habitats are recommended for pollinator plantings (e.g. Eutrochium purpureum, Helianthus decapetalus; Byers et al. n.d.), or are closely related to known pollinator-attractive species (e.g. Bidens cernua, Hydrophyllum canadense, Stachys hispida), suggesting they might also provide pollen or nectar resources for wild bees (Tuell et al. 2008). There are relatively few studies documenting wild-bee communities in wetlands, although most report rich bee communities in and around water and wetland habitats (Stewart et al. 2017; Vickruck et al. 2019), including presence of some threatened species (Moroń et al. 2008). In our study, relatively small areas of wetlands in a landscape correlated with higher bee abundance and richness, and bee richness decreased substantially only 200 m removed from water. Thus, integrating small areas of wetland habitat into landscapes may dramatically improve spring bee populations. Moreover, wetlands host many other insect, birds, and amphibian taxa (Riffell et al. 2003; Gibbons et al. 2006; O’Neill and O’Neill 2010), and thus wetland restoration and conservation could contribute more broadly to biodiversity goals. These findings also confirm the importance of existing programs that protect wetland habitat from agricultural and urban development.

At the local scale, late-season bee communities were more abundant and diverse at managed sites (diversified vegetable farms and apple orchards) and sites with more fall flowering plants and show complex associations with soil properties. In mid-summer, old fields had more floral resources than agricultural habitats (Fig. 6), but plant communities in orchards and mixed vegetable farms hosted higher bee abundance. We speculate that, in agricultural habitats, flowers of crop plants and weeds in fencerows, fallow, or between-row sections of fields are particularly important floral resources for bees, as weeds provide valuable resources for bees (Norris and Kogan 2005; Bretagnolle and Gaba 2015; Requier et al. 2015; Rollin et al. 2016). Also, most of the farms and orchards we sampled were not intensively tilled or have substantial untilled area in proximity, which likely benefits species of ground-nesting bees (Ullmann et al. 2016). As discussed previously, the association of late bee species richness with soil parameters is likely due to the impacts of soil on plant community flowering.

At the landscape scale, richness of late-season bees showed negative associations with the amount of agricultural land. The positive association of agricultural land with late bee abundance at the local scale and negative impact of agriculture on diversity of late bees at the landscape scale appears contradictory. But, most of the agricultural land in our study region is intensively managed row crop agriculture, which primarily supports generalist, disturbance-tolerant species (Kennedy et al. 2013; Harrison et al. 2018; Grab et al. 2018). Thus, there were likely fewer species present at sites embedded in high agriculture landscapes. However, greater late summer and fall floral resources provided by the flowering plants (likely primarily weeds) in our highly managed vegetable and orchard sites attracted a greater abundance of these generalist species from the surrounding landscape (Tscharntke et al. 2005).

We observed notable differences between the two years of our study, with more species of early bees and fewer late bees in 2019 compared with 2018. In 2019, early spring was warmer than 2018 and we delayed our sampling time due to logistical constraints which likely altered the composition of the bee community we sampled. We attribute the higher richness of early bees in 2019 to a better match between our sampling time and flight period of most early bees (Kammerer 2021). The differences in the summer collections was likely also due to weather conditions in the previous year, as the summer of 2018 was extremely rainy, with near-record high precipitation (DiLiberto 2018), which may have reduced foraging opportunities and negatively affected bee populations (Tuell and Isaacs 2010; Vitale et al. 2020). Evaluating the effect of weather conditions on bee communities requires multiple years of sampling, and thus we did not formally include these variables in this study, but such analyses have been conducted in other studies (Kammerer 2021; Filazzola et al. 2021).

Though we were able to provide important insights about how local and landscape variables influence bee communities, our study has several limitations. First, we sampled wild bees using pan traps, which perform poorly in areas with abundant floral resources and have some taxonomic biases, including under-sampling large bodied bees (Roulston et al. 2007; Baum and Wallen 2011). By using pan traps, we were not able to determine if the bees we sampled were nesting or foraging within our study site, or en route to a different patch. We minimized, but could not completely exclude, the latter outcome by not using extremely attractive blue vane traps (Gibbs et al. 2017). Second, our estimates of local and landscape floral area assumed flower production is solely a function of the composition of plant communities of a given locale. In quantifying landscape floral area, we assumed plant communities of the same habitat type had equal numbers of flowers. In reality, the number, size, and quality of floral resources is influenced by many factors, including temperature, moisture availability, and soil fertility (Cardoza et al. 2012; Mu et al. 2015). Third, by choosing to analyze our data with random forest models, we could not assess interactions between local and landscape quality. Local and landscape quality metrics were only weakly correlated across scales (minimum, mean, and maximum Pearson’s r = -0.6, 0.01, and 0.56, respectively, Figure S10), potentially suggesting independent effects of local and landscape quality, but we could not examine these questions. Finally, while assessing conservation plantings for pollinators, it has been a persistent challenge to assess whether adding flowers for bees increases wild-bee populations, or just concentrates individuals in high-quality patches. Our analyses cannot definitively show that high-quality sites improved bee fitness, compared with drawing in more individuals that were already present in the landscape.

Conclusion

In this study, we identified key components of local and landscape quality for wild bees, generating essential targets for bee conservation programs. Building on previous research showing that bees respond to both local and landscape resources, we found that the most relevant spatial scale varies by season and can differ when considering bee abundance or richness. Early bees took advantage of resources – like spring flowering trees – that are present in surrounding landscapes, while late bees utilized floral resources provided at the local scale in the habitats we surveyed. Adding spring blooming trees to habitat management and restoration schemes could thus improve outcomes for spring bee communities. Additionally, targeted additions or restorations of wetlands and surface-water features provides conservation benefits to spring bees as well as other plant and animal taxa. While late season bees increase in abundance in areas with more summer-flowering plants, in highly agricultural landscapes, improving local quality for late bees will likely primarily benefit disturbance-tolerant, generalist species (Harrison et al. 2019). By considering spatial and temporal variation in resources, we developed context and season-specific recommendations to improve habitat quality for wild bees, a critical component of conservation efforts to offset the manifold stressors threatening these taxa.