Estimating bee abundance: can mark-recapture methods validate common sampling protocols?

Wild bees can be essential pollinators in natural, agricultural, and urban systems, but populations of some species have declined. Efforts to assess the status of wild bees are hindered by uncertainty in common sampling methods, such as pan traps and aerial netting, which may or may not provide a valid index of abundance across species and habitats. Mark-recapture methods are a common and effective means of estimating population size, widely used in vertebrates but rarely applied to bees. Here we review existing mark-recapture studies of wild bees and present a new case study comparing mark-recapture population estimates to pan trap and net capture for four taxa in a wild bee community. Net, but not trap, capture was correlated with abundance estimates across sites and taxa. Logistical limitations ensure that mark-recapture studies will not fully replace other bee sampling methods, but they do provide a feasible way to monitor selected species and measure the performance of other sampling methods.


INTRODUCTION
Estimating abundance is a fundamental component of population and conservation biology, yet for some organisms of conservation concern, abundance is essentially unknown (Henderson & Southwood 2016;Noon et al. 2012). This knowledge gap is concerning because many environmental assessment studies, including those that can prompt regulatory protections, require a clear understanding of abundance trends (Nichols & Williams 2006;Otis et al. 1978;Schwartz 2008;Williams et al. 2002). In recent years, evidence of declines in populations of bees and other pollinators has spurred strong interest in monitoring bee abundance over time and among habitat types (Packer & Darla-West 2021;Portman et al. 2020;Potts et al. 2010;Prendergast et al. 2020;Westphal et al. 2008). In the USA, nine bee species are now protected under the Endangered Species Act of 1973, and repeated calls for large-scale monitoring efforts led to the formation in 2020 of the US National Native Bee Monitoring Research Coordination Network (US Fish and Wildlife Service 2021; Tepedino & Portman 2021;Woodard et al. 2020). Monitoring bee populations has both ecological and economic incentives, due to the important role of bees as pollinators in natural, agricultural, and urban environments (Kearns et al. 1998;Potts et al. 2016).
The most common sampling methods in ecological or monitoring studies of bees include pan traps, vane traps, trap nests, netting from flowers, and observation (over specified time, transect, or area) (e.g., Packer & Darla-West 2021;Prendergast et al. 2020). Each of these methods provides a population index: they yield a count per unit of sampling effort, rather than an estimate of the number of individuals per unit area (McKelvey & Pearson 2001). Differences in counts over time, space, or species are nevertheless often interpreted as proportional differences in abundance or relative abundance of species (e.g., Moylett et al. 2020;Winfree et al. 2007). The validity of this interpretation depends on a consistent relationship between the index sample and the size of the underlying populations. In other words, capture probabilities must be consistent across time, habitats, and species (McKelvey & Pearson 2001;Slade & Blair 2000). Thus, even if a sampling method provides a satisfactory index for one species, differential capture probabilities across species or habitats may yield poor indices of community composition-that is, the relative abundances of the species in a community (Conroy 1996). Multi-species studies in other taxa, such as small mammals or terrestrial arthropods, have concluded that indices such as trap catch cannot represent the relative abundance of species, due to different species-specific capture probabilities (e.g., Andersen 1995;Hopkins & Kennedy 2004;Lang 2000).
The relationship between index samples and underlying community composition has not been examined in studies of bees. Few studies have examined this relationship for individual bee species (reviewed below), and none has done so at a community scale. Instead, community-level work typically compares two or more index methods to each other, and these studies have found that each method provides a different picture of the bee community (Gibbs et al. 2017;Prendergast et al. 2020;Westphal et al. 2008;Wilson et al. 2008). Moreover, circumstantial evidence suggests that the efficiency of bee sampling methods such as pan traps varies across habitats and taxa (Baum & Wallen 2011;Cane et al. 2000;Portman et al. 2020;Prendergast & Hogendoorn 2021;Wilson et al. 2008) Thus, capture probabilities clearly differ among sampling methods and likely differ among habitats and taxa. Despite this concern, there has been no attempt to quantify and account for differing capture probabilities in any bee community sampling method (Henderson & Southwood 2016;Packer & Darla-West 2021).
Mark-recapture methods are a common and often effective means to obtain estimates of animal abundance with explicit measures of precision (such as SE) (Henderson & Southwood 2016;Kéry & Royle 2016;Krebs 1999;Otis et al. 1978;Williams et al. 2002). While widely used in studies of mammals, birds, and amphibians, mark-recapture methods are rarely used to estimate bee abundance (Henderson & Southwood 2016;Nichols et al. 1981;Portman et al. 2020). Here, we review existing mark-recapture studies of bees to assess the potential of mark-recapture as a tool to estimate bee population sizes and relative abundances. We conclude with a case study in which we used mark-recapture methods to assess whether aerial netting and pan trapping yielded reliable indices of abundance and relative abundance for sampled species in the community.

SCOPE OF LITERATURE REVIEW
We searched Clarivate Analytics Web of Science and Google Scholar databases in January 2021 to identify studies that used mark-recapture methods to estimate bee abundance. We used the following keyword combination for Web of Science: [("mark recapture" OR "capture recapture" OR "mark release recapture" OR batchmark) AND (apoidea OR bees OR bee)]. Google Scholar search phrases included "bee mark recapture population estimate," "bumblebee mark recapture," and "capture recapture bee population". We found additional articles through cited references in the primary studies.
We recovered 20 mark-recapture papers that calculated population estimates of bee species (Table I). We identified 37 additional mark-recapture papers that did not calculate population estimates, and these are listed in Text S1. The geographic scope of the 20 reviewed studies was limited: 50% were completed in Europe, 30% in North America, 10% in South America, and 10% in Central America (Table II). No studies were completed in Asia, Africa, or Australia.

TARGET SPECIES
If mark-recapture experiments are to be used to validate other multi-species sampling methods, they must be feasible for a broad range of bee diversity. To assess the breadth of prior mark-recapture studies, we classified the target species in each study according to their taxonomy (family), social organization (Text S2), nesting substrate (Text S2), and body length (Text S2). The reviewed studies sampled a total of 24 bee species, henceforth target species. Of the 24, 79% were in the family Apidae, 8% Andrenidae, 8% Megachilidae, and 4% Colletidae (Table II). The families Halictidae, Melittidae, and Stenotritidae were not sampled. However, one study that did not estimate population size collected mark-recapture data for three halictid speciesindicating that population estimates should be feasible for this bee family (Cecala & Wilson Rankin 2020).
Target species represented a range of social behaviors. Specifically, 38% of target species were social, 13% broadly parasocial (semisocial or communal), 42% solitary, 4% parasitic, and 4% had multiple social behaviors (Table II; Text  S2). The target species also exhibited diverse nesting behavior, with 46% (mostly Bombus and euglossine bees) nesting in pre-existing cavities, 21% in soil, 8% in wood, and only 4% in stems (Table II; Text S2). Twenty-one percent of the target species did not belong to a single preferred nesting category or had unknown preferences. To assess body size, we classified each species as small (≤ 7 mm body length), medium (7-12 mm), or large (≥ 13 mm) (Quistberg et al. 2016). Most target taxa (79%) were large, 13% medium, 4% both large and medium (when multiple species were pooled in a target genus), and only 4% small-bodied (Table II; Text S2). Often, researchers obtained separate abundance estimates for male and female bees, owing to differences in their phenology and behavior (Table III).
Although mark-recapture methods have been used to estimate population sizes in taxonomically and ecologically diverse bee species, the   (Portman et al. 2020;Roulston et al. 2007). Moreover, several endangered bees are small-bodied species in the genus Hylaeus (USFWS 2016); conservation efforts could directly benefit from non-destructive population estimates generated from mark-recapture experiments. There is clearly a need to develop and test mark-recapture methods suitable for small-bodied species.

MARKING TECHNIQUES AND ASSUMPTIONS
Researchers marked bees with a variety of permanent and impermanent methods. Impermanent methods included marking the dorsal thorax of each bee with colored marker pens, paints, enamels, or numbered plastic tags (Table I). Despite the potential for impermanent marks to fall off or fade, they can last for at least several weeks. For instance, euglossine bees marked with paint or plastic tags have been recovered 7-8 weeks after their initial capture (Ackerman & Montalvo 1985;Pokorny et al. 2015). Permanent marking methods included abdominal hair shaving and removal of a tarsal segment (Janzen 1981;Tosta et al. 2017). Although hair-shaved bees have been recovered up to 6 weeks after marking (Janzen 1981), even permanent marks could eventually be obscured by bodily wear or injury.
No matter the permanence of the marking method, physical marks can be applied individually or in batches. With individual marking, every captured bee receives a unique identifier such as a numbered tag, a distinctive combination of paint dots, or a unique hair shaving pattern. With batch marking, multiple individuals receive the same marking pattern, which represents only their collective location or time of capture. Batch marking has been performed with impermanent colored paints, enamels, and leg segment removal. Finally, genetic mark-recapture methods do not require that insects be physically marked and instead identify individuals or siblings by genotyping, often using a nonlethally sampled leg segment. Although genetic mark-recapture can, in principle, be used for any species (Pennell et al. 2013), in studies of bees, it has been used exclusively for bumble bees with the goal of estimating the total number of colonies (not individual bees) present at a study site (Goulson et  a Target species whose traits have been inferred from closely related species (Text S2) b S small (≤ 7 mm), M medium (7-12 mm), L large (≥ 13 mm) c Sources for sociality, nesting, and size are provided in Table SI Bischoff (   1 3 All studies assumed that marks were not lost over time, and all but one study assumed that marking did not affect a bee's survival or behavior. However, it is important to explicitly test these assumptions because paints and markers often contain toxic materials, and some permanent marking methods may create wounds that can become infected (Henderson & Southwood 2016). In addition, colorful marks could cause bees to be more vulnerable to predation or alter their thermoregulatory ability. Nevertheless, few studies examined the validity of these assumptions, either directly (e.g., laboratory or field studies) or indirectly (e.g., goodness-of-fit tests). In laboratory studies, cellulose paint markings and thoracic scratch marks were not lost over time and did not affect bee longevity, although the authors provided few details about their methods (Danks 1971;Pokorny et al. 2015). No bee study explicitly tested mark longevity in the field, but this could be done with doublemarking, which allows researchers to detect mark loss (Henderson & Southwood 2016;Henry & Jarne 2007;Seber & Felton 1981).
A double-marking system could also be used to jointly test the effects of marking on bee survival and behavior (Henderson & Southwood 2016;Seber & Felton 1981). With this approach, two marking techniques are used individually and concurrently. For example, bees could be marked with either egg albumin protein powder that is undetectable to the naked eye, traditional paint marks, or both. If bees have similar recapture rates regardless of mark type, then the type of marks applied likely have no impact on the survival or behavior of the bees. Bischoff (2003) took another approach, using a χ 2 test to determine whether capture rates of Andrena vaga individuals differed among unmarked, once-marked, or multiple-marked bees. Capture probability declined only after eleven markings, suggesting that the impact of marking on bee behavior was likely small.
Small-bodied bees appear to be underrepresented in the mark-recapture literature, perhaps because of challenges inherent in marking very small animals. However, several novel marking techniques could be useful in future work on small-sized species. Microdots are small polymer discs (0.5-mm diameter) that can display up to 26 characters of information. Whitehead & Peakall (2012) successfully used microdots in a mark-recapture study of a 9-mm-long parasitoid wasp species. Similarly, protein powder marks have been developed for other mark-recapture studies and may also be useful for batch marking in abundance studies. Advances by Boyle et al. (2018a, b), and Jones et al. (2006) have shown that protein powders can be applied directly to adult bees or to cocoons, that they persist through repeated exposures to simulated rain, and can be sampled nonlethally from recaptured bees. The recovered proteins are then identified in an ELISA assay, which requires a relatively modest investment in equipment. Protein marking is now a reasonable option for mark-recapture studies targeting potentially vulnerable wild bee species.

MARK-RECAPTURE MODELS
Mark-recapture models can be divided into two broad categories-closed and open. Within these categories, the 20 studies used seven modeling approaches and methods (Table III), generating population estimates using a wide range of sample sizes (number of captures, n = 3 to 959; mode = 23) (Fig. 1). Sixty-one percent of the sample sizes were ≤ 50 and 73% were ≤ 100. All the mark-recapture models described below, except for genetic mark-recapture models, can be used with batch marked bees. Details of these models can be found in Craig (1953), Henderson & Southwood (2016), Krebs (1999), and Matter & Roland (2004).

Closed population models
Closed mark-recapture models assume that there are no births, deaths, immigrations, or emigrations during the period of study (Henderson & Southwood 2016). The closed models that have been used in bees include the Craig (Craig 1953), Lincoln-Petersen (Lincoln 1930), Schumacher-Eschmeyer (Schumacher & Eschmeyer 1943), and K-sample capture-recapture models (Otis et al. 1978;Williams et al. 2002). These models share three main assumptions: (1) the population is closed throughout the study period, (2) bees do not lose their marks over time, and (3) all marks are correctly noted at each sampling occasion (Henderson & Southwood 2016;Krebs 1999;Matter & Roland 2004;Nichols et al. 1981). The Craig, Lincoln-Petersen, and Schumacher-Eschmeyer methods assume a constant and equal capture probability for all individuals, and lack goodness-of-fit (GOF) tests that would suggest whether assumptions were met (Henderson & Southwood 2016;Krebs 1999).
K-sample capture-recapture models, which we employed in the case study below, relax this assumption and incorporate GOF tests (Otis et al. 1978). This method uses a family of closed population models where each member of the family holds different assumptions about the potential sources of variation in capture probability (Nichols et al. 1981). Commonly used models include M 0 (which assumes a constant capture probability), M t (time varying capture probability), M b (behavioral response to capture impacts capture probability), and M h (heterogeneity in capture probability among individuals for unknown reasons) (Otis et al. 1978). Mixed models that combine assumptions are also frequently used. By systematically testing each member of the family of closed population models, the model that best fits the data is identified (Henderson & Southwood 2016). This approach improves upon older closed population models, but functions best when at least four or five sampling periods are completed (Henderson & Southwood 2016;Krebs 1999).
Four studies used genetic mark-recapture techniques to compute the population size of social bee species (Table III). These studies also assumed a closed population yet required no physical marking (Miller et al. 2005). All four genetic mark-recapture studies used the two innate rates model (TIRM), which allows individuals to be captured multiple times per sampling session and does not assume equal capture probability among all individuals (Pennell et al. 2013).

Open population models
Open mark-recapture models assume that populations do change during the study period. In our review, two types of open mark-recapture models were used to estimate population size: the Manly-Parr and Jolly-Seber methods (Jolly 1965;Manly & Parr 1968;Seber 1965). The Manly-Parr method requires multiple sampling events and allows mortality to vary with age (Manly & Parr 1968). While this method does require sampling a relatively high proportion of the total population (> 25% for populations with less than 250 individuals and > 10% for larger populations) and can produce larger standard errors than other open population models, it is particularly suited to populations where mortality may be related to age (Henderson & Southwood 2016).  Figure  excludes four studies that used genetic mark-recapture methods and one study that did not report sample sizes.

3
The Jolly-Seber method was the most commonly used population model in the reviewed studies (Table III). This method includes multiple versions such as the POPAN, Link-Barker, Burnham JS, Pradel-recruitment, and Pradel-λ models (Henderson & Southwood 2016). The Jolly-Seber method requires multiple sampling periods and has six main assumptions: (1) all individuals have an equal probability of capture, (2) every marked animal has the same probability of survival, (3) marks are not lost, (4) marks are not overlooked at capture, (5) all captured individuals have the same probability of being returned to the population, and (6) sampling time is negligible in relation to intervals between samples (Krebs 1999;Nichols et al. 1981). Additionally, the Jolly-Seber method assumes that all individuals are captured from one freely mixed population that is located within a single area and that all losses through emigration or death are permanent (Jolly 1965;Nichols et al. 1981).
Population estimates generated from the Jolly-Seber method are usually reliable if more than 9% of the total population is sampled and the survival rate from one sampling period to the next is not less than 0.5 (Bishop & Sheppard 1973). While the Jolly-Seber method is a useful markrecapture model, it may be less accurate when the mortality of the study organism is related to age (Henderson & Southwood 2016). Some bee species do have increased mortality with age, indicating that the Jolly-Seber method may be inappropriate (e.g., Goldblatt & Fell 1987) and should instead be used when the duration of the study is very short compared to the adult lifespan, and when mortality is primarily due to predation or environmental stressors. However, researchers are now developing Jolly-Seber models with agedependent processes, but these novel approaches have so far relied on individual marking and have not been conducted in bees (e.g., Hostetter et al. 2021).
To determine whether age-dependent mortality is a concern, researchers can evaluate the survivorship of their target species over time. For example, to determine if the Jolly-Seber or Manly-Parr method was appropriate for a markrecapture study of the butterfly Maniola jurtina, Brakefield (1982) modeled survivorship curves for the butterfly by first releasing a cohort of freshly emerged marked individuals and tracking their recaptures over time. The butterflies initially exhibited a period of constant survival, followed by a period of senescence. Similar approaches could be used in studies of bees to identify time periods during which the Jolly-Seber model is appropriate. Although none of the bee markrecapture studies that used the Jolly-Seber method plotted survivorship curves for their target species, some experiments were completed over a very short time period of time, as little as 8 days (Table I) (Tosta et al. 2017;Yamamoto et al. 2014). However, other studies that lasted weeks or months may be subject to more substantial impacts, and future work would benefit from explicit tests of age-dependent survival.

Fit and precision
Few studies reported whether their data met mark-recapture model assumptions. Even when GOF tests or model selection were employed, the results of fit tests were not reported (Bischoff 2003;Goulson et al. 2010;Hennessy et al. 2020Hennessy et al. , 2021. In the absence of GOF tests, or direct field tests of model assumptions, the validity of the resulting population estimates is unknown, and these oversights must be explicitly addressed in future studies. Assuming that model assumptions were met, the precision of the resulting estimates is of interest. If an intense sampling effort produces imprecise estimates, the cost of the estimate may exceed its usefulness. The coefficient of variation (CV) is a common measure of precision, computed as the standard error of the estimated population size divided by the estimate itself, often presented as a percentage. Smaller values indicate greater precision. The CV depends on sample size, population parameters such as capture and survival probabilities, and the specific model employed. As a rule of thumb, Pollock et al. (1990) suggested that a CV of 20% should be a reasonable target when planning a markrecapture study. Not all reviewed studies reported the CV or information required to compute it. Among those that did, the average CV across estimates was often larger than 20%, but nearly every study-excluding those of male orchid bees (Eulaema, Euglossa)-yielded at least some estimates with high precision (Table III). Although bee population estimates will often be noisy, the range of detection probabilities (0.06-0.93) and CVs (Table III) they report broadly overlap those in similar studies in other taxa (e.g., Bailey et al. 2004;Haddad et al. 2008;Nichols et al. 1981) suggesting that informative population estimates are possible and can be upweighted in downstream analyses or decisions (e.g., Bolam et al. 2019).

Future directions in modeling bee population size
Developments in modeling approaches have made it possible to handle and account for key assumptions (e.g., population closure), incorporate sources of variation (e.g., heterogeneity), and use additional information (e.g., locational data) to derive more ecologically meaningful population parameters (Borchers & Fewster 2016;Kéry & Royle 2016;Williams et al. 2002). For example, Pollock (1982) described the robust design to compute population size. This method uses a nested sampling structure, composed of a series of closely spaced mark-recapture samples, which are separated by longer time intervals. K-sample closed population capture-recapture models such as those described by Otis et al. (1978) are applied to the groups of closely spaced samples. During the longer time intervals, the population is assumed to be open (Henderson & Southwood 2016;Pollock 1982). By combining these techniques, the robust design can reduce bias caused by unequal catchability and provide estimates for population parameters (such as birth rate and population size) in the first primary sampling period that cannot be generated with open population models alone (Henderson & Southwood 2016;Pollock 1982).
Another area of major advances in model development that might be applicable to bee research are spatial mark-recapture models (Borchers & Fewster 2016). Bee researchers have yet to apply these types of models because mark-recapture studies typically capture bees at a single resource (flowers or nests) that is clumped in the landscape. Estimates of the population at a given resource patch are likely unique to that patch and do not represent landscape-scale densities. However, if appropriate and feasible, spatial mark-recapture models are a powerful research tool because they yield explicit estimates of population density, not just population size (N) within an area of interest (Sollmann et al. 2012). Spatial mark-recapture models account for heterogeneity in capture probability arising from the interaction between animal activity centers and sampled locations (Borchers & Fewster 2016;Chandler & Royle 2013). Generally, spatial recaptures imply that the identity of the bee be known to obtain locational (spatial) data. Marking bees can be challenging, particularly for smaller-sized bees. In such cases, Chandler and Royle (2013) and Augustine et al. (2018) have developed methodological approaches to deal with unmarked or partially marked populations (e.g., batch marking). Data requirements consist of spatially referenced counts made on one or multiple occasions at closely spaced sample units such that individuals in the population are encountered at multiple locations (Chandler & Royle 2013). However, spatial mark-recapture models still assume that movements and resource use are independent among individuals; thus, if the assumption of independence is not met in social bee studies, estimates of population abundance might be biased with inflated variance (overdispersion).
Finally, in social bee species, there is a distinction between reproductive population (number of colonies) and total population (number of workers). Depending on the study goals, either parameter may be of interest. Both genetic and non-genetic mark-recapture methods can provide these estimates. Colonylevel models have been applied only to Bombus and not to other social or parasocial species, such as many halictids and euglossines.

3
In these taxa, where the morphological distinction between castes is subtle or absent, genetic mark-recapture holds promise for providing estimates of the reproductive population size. Even when a total worker population size is desired, genetic mark-recapture may help overcome difficulties with physically marking small insects.

USING MARK-RECAPTURE POPULATION ESTIMATES TO VALIDATE INDEX SAMPLES
Given that mark-recapture methodology is established in bee studies and has produced reasonably precise estimates, it may be a viable method to evaluate the biases of index sampling methods. Only four studies, however, compared mark-recapture population estimates to index sampling methods, including nest counts, survey walks, and pan traps (Table IV). All studies made this comparison within individual species, not among members of a community. Three studies compared population estimates of the number of bees counted during survey walks, with mixed results. In the solitary bee species Andrena hattorfiana, survey counts and population estimates were highly correlated (r = 0.99, p < 0.001) (Larsson & Franzén 2008). However, in the solitary Eucera pruinosa, Tepedino (1981) concluded that counts during survey walks overestimated population size early in the season, when few flowers were open and the same bees were counted repeatedly as they moved from flower to flower. The third study addressed two social species (Bombus hortorum and Bombus terrestris) and compared the numbers of bumble bees observed on transect walks, the number caught in pan traps, and the number of colonies estimated using genetic mark-recapture. These three metrics were uncorrelated, but it is perhaps not surprising that forager number and nest number were unrelated since nest size can vary (Wood et al. 2015).
Finally, one 2-month study of Andrena vaga compared repeated nest counts to the estimated number of females in the population. Both metrics fluctuated through the season, but were not correlated (Bischoff 2003). Many females detected early in the season likely failed to nest, whereas some females built more than one nest, such that population size and nest counts represented different aspects of the species' demography. Thus, two of the four studies compared population estimates to index samples that represented entirely different demographic rates (colony number vs. worker number; total females vs. nests) and thus tell us little about the validity of those index samples. Mark-recapture methods are clearly underutilized as a means to understand the performance of other common methods such as aerial netting and pan trap sampling.

CASE STUDY
Although mark-recapture methods in studies of bees have been employed since the 1950s, they have rarely been used to inform community-level work. Five of the reviewed studies attempted to compute population estimates for multiple closely related target species, but none obtained estimates for more than two species in a single study (Table III). To assess the feasibility of multi-species markrecapture in bee studies and to compare these population estimates to other sampling methods, we conducted a mark-recapture study in 2020 at three study sites in the Butner-Falls of Neuse Game Lands. Sites were near Raleigh, NC, USA ("site A," 36.0278, − 78.6715, "site B," 36.0040, − 78.6813, and "site C," 36.0276, − 78.6661). Sites A and C were about 0.5 km apart, while site B was 3 km away. Each study site was located in a clearing within a pine forest that had undergone prescribed fire in the past three years; vegetation included grasses, forbs, shrubs, and tree saplings (Fig. 2). We selected these sites due to their association with an ongoing study by the North Carolina Wildlife Resources Commission (NCWRC).

Methods
The study was divided into five two-week sampling periods from July 6 to Sept. 11, 2020. During each period, we dedicated 3 days to mark-recapture activity, followed by 1 day of sampling with pan traps and aerial netting. On mark-recapture days, we captured bees using insect nets. All bees were netted within 50 m of a permanent pan trap transect. Sampling occurred during clear weather conditions from 10:00 am to 4:00 pm. Four bee taxa were captured: eastern carpenter bees (Xylocopa virginica), leafcutting bees (Megachile spp.), two-spotted longhorn bees (Melissodes bimaculatus), and green sweat bees (Augochlorella spp., Augochlora pura, & Augochloropsis spp.). Each site received 30 person-minutes of active search time in the morning (10:00 am-1:00 pm) and 30 personminutes in the afternoon (1:00 pm-4:00 pm). We visited the sites in a different order each sampling period. After capture, we transferred bees into a "bee squeezer" (Kearns & Thomson 2001), where we marked them on the dorsal thorax with a dot of color using Posca extra finepoint paint pens (Uni Mitsubishi Pencil, Tokyo, Japan; Fig. 2). A different paint color was used on each sampling day. Bees recaptured on days two and three of the survey were marked with an additional spot of paint and released.
On pan trap and net sampling days, 30 pan traps were placed approximately 30 cm off the ground along a fixed 10 m × 45 m transect. The transect consisted of 10 pan trap triplets (blue, yellow, white) arrayed in 10 parallel rows. All pan traps were spaced 5 m apart within and between rows. Traps were deployed for about 10 h per day during daylight hours. While traps were deployed, we sampled by aerial netting within 50 m of the transect. Each study site received 30 min of active search time in the morning (~ 10:00 am-1:00 pm) and 30 min in the afternoon (~ 1:00 pm-5:00 pm). The lag time between the mark-recapture and pan trap and net sampling ranged from 1 to 7 days, with a mean of 3.6 days across all five sampling periods. This variation was unavoidable since different entities performed these methods and did so within weather constraints.
We estimated population sizes using program CAPTURE for closed population K-sample models featured in program MARK (Otis et al. 1978, White & Burnham 1999. Population closure was reasonable for our study because each 3-day mark-recapture sample was completed within a 5-day period, which is short relative to the adult activity period of the focal bee taxa. We attempted to generate population estimates for every unique combination of target taxa, study site, and survey week; however, some estimates could not be generated due to low capture or recapture rates. We determined if models met model assumptions (GOF) using chi-square tests. We generated observed/expected frequencies from the most generalized model (global) in the model set using the program MARK (i.e., output/specific model output/residuals). We summed the chi-square values and computed a probability whose degrees of freedom were equal to the number of encounter histories minus one. If expected values were small (i.e., < 2), we pooled encounter histories to minimize spurious chi-square results. For index samples to provide a reliable indicator of community composition, each species must have similar detection probabilities in the index sample, such that index captures and population estimates fit a single linear relationship for all species in the community.
To test this relationship for our focal taxa, we conducted linear regression analyses in which mark-recapture estimates were the predictor, and net and trap capture were the responses, with one data point per site-species combination. Specifically, for each site-species combination, we summed pan trap and aerial netting captures across sampling periods, and we took a weighted average of all the markrecapture estimates across sampling periods (weighted by the reciprocal of the width of the 95% confidence interval, after log transformation). Because mark-recapture was performed on different days than netting and pan trapping, this pooled dataset prevented daily weather or other transient conditions from obscuring overall relationships between the samples and population estimates at the study sites.

Results and discussion
Of sixty possible site-species-date combinations, we generated 22 population estimates (Table V, Fig. 3). The CAPTURE model M 0 , which assumes a constant capture probability, was the most common model fit to the data. Twenty estimates were obtained from models that met assumptions (chi-square GOF, P > 0.05). Observed/expected values were insufficient to run GOF tests on the two remaining models. The precision of estimates was similar to that reported in prior studies of individual taxa (Table III), with CVs ranging from 0.12 to 0.90.
Capture probabilities ranged from 0.04 to 0.52 across all sites and target taxa. Across sites, the average capture probability (mean ± SE) of green sweat bees was 0.21 ± 0.07, leafcutting bees 0.27 ± 0.06, two-spotted longhorn bees 0.36 ± 0.05, and eastern carpenter bees 0.23 ± 0.07. Across sites and dates, the standard sampling methods yielded a sample of 77 bees of the focal taxa by aerial netting and 42 in pan traps. Species-level identifications of captured green sweat bees and leafcutting bees are provided in Tables SIII-SVI. Eastern carpenter bees and leafcutting bees were rarely or never captured in pan traps, while other target taxa such as longhorn bees and green sweat bees were more commonly sampled (Fig. 3).
The linear regressions of index samples on population estimates included nine data points each because the original 22 mark-recapture estimates correspond to nine site-species combinations (Fig. 4). Population estimates were correlated with aerial netting captures (R 2 = 0.51, r = 0.71, p = 0.03) but not pan trap captures (R 2 = 0.11, p = 0.39). The correlation between net capture and population size approaches r = 0.8, a guideline suggested by Slade and Blair (2000) to determine when an index sampling method provides a good indicator of population size.
Collector bias is a common concern with aerial netting, but in this case, netting performed well relative to mark-recapture estimates even though these methods were conducted by  Taken together, these relationships suggest that, across sites and species, aerial netting captured bees in proportion to their underlying abundance and provided a reasonable index of community composition, whereas pan traps did not. While pan traps can be useful in bee diversity surveys (Prendergast et al. 2020;Westphal et al. 2008), our results validate concerns that they should not be used as a quantitative index of population size or relative abundance of species. However, the pan trap samples collected during this study were small. Many authors have noted that the apparent efficiency of pan traps varies among years and habitats (Baum & Wallen 2011;Cane et al. 2000;Wilson et al. 2008), and further research is needed to identify conditions under which pan traps can and cannot provide a reliable index of bee abundance and composition.

CONCLUSIONS AND FUTURE DIRECTIONS
Our literature review and case study demonstrate that mark-recapture methods are feasible in taxonomically and ecologically diverse bees and can produce reasonably precise population estimates (Table III; Table V). Although laborintensive to generate, mark-recapture population estimates could inform other sampling methods or conservation actions and are currently underutilized in the ongoing global discussion of bee sampling and monitoring methods. Despite their potential, mark-recapture methods in studies of bees do have several limitations and challenges, as well as exciting future directions, which we summarize below as a guide to planning future studies.

Challenges and future directions
Challenge: Small body sizes Because small-bodied bees are frequently captured in some index sampling methods, such as pan traps (Roulston et al. 2007), future comparative work needs to tackle population estimates in these small-bodied groups. Our case study, as well as the work of Cecala & Wilson Rankin (2020), suggests that marking is feasible in these groups. Newer methods -including the use of microdots, protein powders, or genetic mark-recapture-may also improve the feasibility and precision of population estimates in small species.
Challenge: Species identification For many bees, identification of species is nearly impossible without microscopic examination. Markrecapture studies, which rely on field identification, are often limited in their ability to identify bees beyond genus or morphospecies. This limitation may hinder conservation studies focused on individual species that are difficult to identify in the field. Genetic mark-recapture methods have the potential to overcome this hurdle since nonlethal genetic samples could be taken from all individuals encountered, and their species identity confirmed by barcoding in the lab (Magnacca & Brown 2012). Even without this advance, mark-recapture experiments have useful applications for easily identifiable bee species and for validating the detection rates of other sampling methods at the genus level.

Future direction: Community-level studies
Since bees are often sampled as a community, where the relative abundance of different species is of interest, there is a need for continued work validating the assumption of equal detection probabilities that underlies community sampling methods. Our case study was limited in scope due to pandemic restrictions, but nevertheless demonstrates that community-level mark-recapture is a feasible goal: With two people working for 15 days in the field, we obtained 22 population estimates for four taxa. To our knowledge, this is the first example of a mark-recapture study in which separate population estimates were generated for multiple, unrelated bee species. A fullscale project could use similar approaches to test the performance of index sampling methods across habitats, taxa, and seasons.
Future direction: Mark-recapture models Validity of mark-recapture estimates depends on meeting model assumptions, and future work on bees will benefit from explicit tests of assumptions in the field, and GOF tests at the analysis stage. We note that our GOF tests included model sets that contained models M 0 , M b , and M t , but M h was included on only two occasions. Our inability to fit M h suggests that larger capture/recapture samples were required to fit the model (Link 2004). There is also a need to quantify the rate of mark loss in the field, the effects of marking on bee survival and behavior in the field, and, for open population models, to assess the age-dependent mortality of study species. We outlined several approaches to these knowledge gaps, which should be a priority in future mark-recapture work on bees.
Among established models, the CAPTURE models for closed populations, as well as Jolly-Seber and Manly-Parr models for open populations, are sound choices for many bee studies; older models that lack GOF tests will rarely be as useful. However, bee research has not yet used the full range of mark-recapture models that may be appropriate to these taxa. Although a few bumble bee studies have begun to take a landscape-scale approach in their analyses of abundance and movement (Mola et al. 2020a, b), there is a need to develop and implement spatial mark-recapture models for bees. Similarly, the so-called robust design that incorporates both open and closed population models may be particularly relevant for bee studies, where populations may function as closed over short periods of a few days but are open over typical intersample periods of weeks to months.

Conclusion
In non-bee taxa, such as mammals and terrestrial invertebrates, the need to validate population indices has been recognized and addressed at least since the 1970s (McKelvey & Pearson 2001;Topping & Sunderland 1992). Despite a rapidly growing global interest in assessing and monitoring bee populations, and strong circumstantial evidence of bias in current sampling methods (Packer & Darla-West 2021;Portman et al. 2020;Prendergast & Hogendoorn 2021; but see Saunders et al. 2021), bee research has yet to fully tackle this issue. Numerous studies have compared different index sampling methods to one another and found that they differ, but the biases of each method are impossible to quantify without comparison to estimates of the underlying populations that account for detection probabilities. The resulting uncertainty hinders attempts to demonstrate how bee abundance and community composition shift as a result of environmental change. Although limitations of mark-recapture methods in studies of bees (labor intensiveness, limited species identification) mean that they will never fully replace other sampling methods, they do provide a feasible way to monitor selected species and to measure the performance of other sampling methods.

FUNDING
This work was supported in part by the USDA National Institute of Food and Agriculture, Hatch project 1018689 to E.Y. and by North Carolina State University.

DATA AVAILABILITY
The field datasets generated and analyzed during the current study are available in the DRYAD repository.

3 CODE AVAILABILITY
Not applicable.

DECLARATIONS
Ethics approval Not applicable.

Consent to participate Not applicable.
Consent for publication Not applicable.

Competing interests
The authors declare no competing interests.
Disclaimer Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Open access This article is licensed under a Creative
Commons Attribution 4.0 International license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. Org/ licen ses/ by/4. 0/.