1 Introduction

Worldwide, colonies of the European honey bee Apis mellifera are managed for crop pollination (Free 1970) and honey production. In the absence of the parasitic mite Varroa destructor (Keogh et al. 2010; Rosenkranz et al. 2010), a substantial but unquantified amount of crop pollination in Australia is provided by unmanaged honey bee colonies (Goodwin 2012). Witnessed elsewhere in the world, the number of unmanaged honey bee colonies declines precipitously after the establishment of V. destructor with only partial recovery achieved after several years (Rosenkranz et al. 2010). Reliable density estimates of feral honey bee colonies can assist in mitigating for the expected loss of their pollination services, such as quantifying the required increase in commercial hives (Utaipanon et al. 2019a). Density estimates can also help to provide insight into the effect of feral honey bees on native species and better support for wild pollinator communities, nectarivores and hollow nesters (Paton 1993; Goulson 2003).

Estimating the population density of honey bee colonies has in the past relied on direct observations (e.g., McNally and Schneider 1996). However, the ability to visually locate all feral colonies is limited by the accuracy of human observation and access to terrain (Oldroyd et al. 1997; Jaffe et al. 2010). Now, advancements in genetic techniques could allow the use of indirect methods to estimate population density, which are based on the genetic fingerprint of individual males (drones) caught in a trap. Drones are caught at drone congregation areas (DCAs) — specific sites where drones from surrounding colonies aggregate and wait for a queen on a mating flight (Zmarlicki and Morse 2015; Forfert et al. 2016). The sampling of drones at DCAs is typically done using a Williams trap — a cylindrical net with lures impregnated with synthetic queen pheromone E-9-oxo-2-decenoic acid (9-ODA) to entice drones into the trap (Diagram of trap given in supplementary materials, Williams 1987; Butler et al. 1997; Utaipanon et al. 2019a). The trap can be elevated using a helium-filled weather balloon or mounted on a long pole.

Using genetic markers, the drones sampled in the trap are classified into likely sibships, which can be used to estimate the number of colonies present in the sample (Moritz et al. 2008; Jaffe et al. 2010; Arundel et al. 2014; Hinson et al. 2015; Utaipanon et al. 2019a, 2021a). The genetic markers used are typically 6–10 unlinked microsatellite loci, and the assignment of sibships is performed using maximum likelihood methods deployed in the program COLONY (Wang 2004; Utaipanon et al. 2019a). To estimate the density from the number of sibship groups in the sample, one further needs to know the distance over which drones are attracted to the DCA. The reported range of drone flight distances is between 900 m (Taylor and Rowell 1988) and 5 km (Ruttner 2015). A recent experimental trial using marked drones from a single hive found that marked drones could be trapped up to 3.75 km from their colony of origin, suggesting that the William’s trap can sample colonies from an area of 44 km2 (Utaipanon et al. 2019b).

The accuracy of assessing the number of colonies based on the number of sibships in a drone sample is subject to two types of sampling errors, referred to as ‘non-sampling’ and ‘non-detection’ errors. Non-sampling errors arise when colonies do not contribute any drones to the sample, either because their drones were not sampled by chance or because the colonies produced too few or no drones (Utaipanon et al. 2019a). The latter colonies are likely to be small and weak, and invest their resources in producing workers rather than reproductive males (Free and Williams 1975; Page and Metcalf 1984). It has therefore been suggested that a non-sampling error of this nature would be of little consequence (Utaipanon et al. 2019a, b). The non-sampling error is assumed to be small at both high and low colony densities, if the sample size is large enough (i.e., several thousand drones) (Utaipanon et al. 2021a). This is because at low densities, all drone-producing colonies would be represented in the sample, and one can reliably conclude that the density of colonies is ‘low’. Conversely, at high densities, the fraction not represented in a sample is small overall, and one could reliably conclude that the density of honey bee colonies in the area is ‘very high’. However, empirical assessment of the non-sampling error would provide increased confidence in the method.

Non-detection errors occur when the numbers of drones genotyped per colony are insufficient for reliable grouping into sibships. Modelling suggests that, to keep non-detection errors below 10% when using 6–10 markers, the sample needs to have an average 6 drones per colony, which is referred to as the ‘drone/colony ≥ 6 rule’ (Utaipanon et al. 2021a, b). This rule has been supported by evidence generated through sub-sampling from already existing drone samples (Utaipanon et al. 2021a).

The reliability of this indirect estimate of colony density depends on the validity of several assumptions. One important assumption is that drones from all colonies arrive at the DCA at similar rates (Utaipanon et al. 2021a). This assumption would not be valid if, within the area of attraction, either colony size or the distance between the colony and the DCA influences the probability of representation in the drone sample. In addition, other factors such as the composition of the landscape, the configuration of suitable trees for nesting or colony density itself may all affect the distance over which drones are attracted to DCAs (Jensen et al. 2005).

Here, we set out to empirically quantify the non-sampling and non-detection errors involved in the indirect estimate of colony density using drone sampling. We investigate the assumptions that neither the distance to the DCA nor the colony size influences the number of drones per colony caught in a trap. We do this by comparing the genetic fingerprints of individual drones and the sizes of sibships trapped at a DCA, to those of queens from known colonies within 44 km2 of the DCA. Our goal is to provide strengthened recommendations for an appropriate correction factor to convert the number of drone families sampled at a DCA to the density of colonies in the area of the trap.

2 Methods

We collected honey bee samples over 2018 and 2019 around the Waite Campus and Arboretum of the University of Adelaide located in Urrbrae, a south eastern suburb of Adelaide, South Australia. The Waite Arboretum and surroundings contain many old Eucalyptus trees, mostly E. cladocalyx, with hollows suitable for feral honey bees (The University of Adelaide 2017). We identified a DCA on the Waite sports oval (34° 58′ 11.86″ S; 138° 37′ 41.99″ E), based on described attributes (Galindo-Cardona et al. 2012) which include (1) a row of trees marking linear flight paths, (2) a large open clearing and (3) flat terrain (Figure 1).

Figure 1.
figure 1

The survey area, location of the drone congregation area (DCA), and the located feral and managed colonies over 2 years. Some points represent multiple colonies that share a tree or apiary. Yellow points: ‘known colonies’ from which worker stings were sampled to infer a known queen genotype. Black points: ‘non-sampled colonies’ — no sting samples obtained. The aqua-coloured area was surveyed in 2018 and 2019, and the grey area in 2019 only. The blue circle indicates 44 km2. Image taken from Google Earth Pro.

2.1 Sampling feral and managed colonies

Visual surveys were used to locate feral honey bee colonies within a 1.5-km radius of the DCA. In 2018, we covered an area of 1.34 km2 with an abundance of suitable vegetation (> 100-year-old Eucalyptus trees), and in 2019, we expanded to an area of 7.65 km2, covering more of the nearby residential areas (Figure 1). The lack of access to urban gardens to the west of the DCA hampered our ability to cover the entire 44 km2 assumed to be within the trap range. The landscape to the East of the DCA contained an abundance of Eucalyptus trees, and this grassland forest covers about one-third of the 44 km2 around the DCA. We used GPS to obtain the latitude and longitude for each feral colony found and measured its distance to the DCA using Google Earth Pro software, v7.3.3.7699 (Google 2020). We based an estimate of colony density on all feral colonies observed.

We inferred the genetic fingerprint of queens from a subset of located colonies (2018: 23; colonies; 2019: 34 colonies) based on DNA obtained from 20 to 30 worker stings per colony. These stings were obtained by waving a suede flag mounted on a 10-m extendable pole in front of the colony entrance (Williamson et al. 2019). We selected colonies to obtain a good range of values for distance to the DCA (Figure 1) but were limited by colony height (< 11.5 m above ground). Each sting was stored individually in a microcentrifuge tube in 100% denatured ethanol at − 20 °C until processing, see below. In 2018, colonies were sampled 3–4 months and in 2019 1–2 weeks before the drone sample was taken from the DCA. Nineteen colonies (3 managed, 16 feral) were sampled during both years. Four feral colonies that had been sampled in 2018 were not sampled in 2019: three no longer existed and one had become inaccessible due to tree overgrowth. The 11 additional sampled colonies in 2019 were newly located colonies in the 44 km2 area.

2.2 Estimating relative colony size

For each located colony, we used a measurement of forager flight activity as a relative estimate of colony size. The significant relationship between the number of foraging workers and colony size has been described and used in previous studies (e.g., Danka et al. 1986; Beekman et al. 2004). We measured colony activity by counting the number of returning honey bees for 30 s on sunny afternoons between 12.30 and 16.30 h at ambient temperatures between 25 and 33 °C. We repeated this 3 times on the same afternoon to obtain an average. We measured relative colony size on the 9th of April in 2018 (5 months before sampling drones) and on the 11th of November in 2019 (around 3 weeks after sampling drones). We determined the correlation of the measured colony size for the colonies that were sampled in both years to determine whether these estimates were meaningful across time. Relative colony size measurements were not recorded for managed hives.

2.3 Sampling drones

At the DCA, we used a Williams trap to sample 1566 drones between 13:30 and 17:00 h on the 31st of October in 2018 and 1484 drones on the 23rd of October in 2019. Sampling days had clear skies, and maximum temperatures ranged between 33 and 35 °C. Drones were stored in 100% denatured ethanol at − 20 °C until processing.

2.4 DNA extraction and genotyping

The samples were processed following methods described by Utaipanon et al. (2019a, b). We extracted the hind legs of drones using Chelex (Walsh et al. 1991) and the worker stings using a high salt extraction (Aljanabi and Martinez 1997) as they had less tissue. We genotyped samples at 8 microsatellite markers: A107, A24, A29, A113, A35, A79, A8 and B123 (Estoup et al. 1994; Solignac et al. 2004). All drone genotypes had a minimum of 5/8 markers successfully genotyped. Drones with genotypes that could be the offspring of two or more known queens, because they carried alleles consistent with the genotypes of multiple queens, were removed from analysis, except when using COLONY to infer sibships (n = 14, or 0.9% of all drones in 2018 and 7, or 0.5% of all drones in 2019). This did not affect detection, because all but one queen had contributed other drones.

We used the worker genotypes from sampled stings to infer the colonies’ maternal-queen genotypes. We did this by determining which two alternate maternal markers per locus were present in daughter-workers at a frequency of approximately 0.5 each. For each colony, we excluded workers with outlier genotypes (0–1 per colony), i.e., those that did not fit the pattern created by the other worker genotypes and may have represented workers drifting to unrelated colonies. If all workers carried the same allele at one locus, we concluded that the queen was homozygous for that locus. The queen genotypes, inferred from DNA from worker stings, are referred to as ‘reconstructed’ or ‘known’ queens from ‘known colonies’.

2.5 Assessing the drone contribution of known queens

To investigate whether all known colonies contributed drones to the sample caught at the DCA, we searched among the genotypes of the corresponding drone sample for potential sons of the reconstructed queens, allowing an error of ± 1 bp for each allele. We also analysed the drone sample in COLONY, by including the known queens as candidate mothers. In 2019, our manual assessment assigned 177 drones to known queens, while COLONY’s assignment resulted in 113 drones assigned, with a probability of > 90%. Therefore, our assignments, allowing ± 1 bp for each locus, were more relaxed, and we opted to use these in our analysis. We then quantified the percentage of known queens that were represented by 0, 1–5 and ≥ 6 drone offspring in the sample taken at the DCA.

2.6 Effects of colony identity on representation in the drone capture

For the 19 known feral colonies that were sampled during both years, we used chi-square tests to analyse whether a colony that contributed to the drone sample in 1 year was likely to also contribute during the next year, allowing for a change in colony genotype in the case of queen replacement. To determine whether the data from both years could be combined, we used a matched pairs t test to investigate whether there was a significant difference in the average number of drones contributed per known colony between the years.

2.7 Statistical analyses

All statistical analyses were done using SPSS, Version 26 (Corp 2010). Means are given ± their standard error (s.e.).

2.8 Colony distance, colony size and representation in the drone capture

As we found no effect of colony identity on drone contribution over the years, and the measurements of colony size were significantly correlated over years (see results), the drone samples from 2018 and 2019 were combined to assess the correlation between distance and relative colony size on a colony’s drone contribution, using generalised linear modelling (GLM) (Corp 2010). We used a binary logistic model to determine whether the probably that a colony contributed to a drone sample was correlated with colony strength and distance to the DCA. For known queens that contributed at least one drone to a sample, a Poisson log-linear model was used to assess whether the number of drones contributed correlated with colony size and distance to the DCA.

2.9 Inferring sibships with COLONY

We used the COLONY software to infer the number of sibships among the drone genotypes. COLONY uses a maximum likelihood analysis of shared alleles to infer relatedness (Wang 2009; Utaipanon et al. 2019a, b). Following the COLONY V2.0.6.8 user guide (Wang 2022), we set the simulations with a haplodiploid mating system, 3 model runs and medium model run length. We then compared these density estimates to the ones based on our visual surveys for each year.

3 Results

Over 2 years, the visual surveys around the DCA identified a total of 81 honey bee colonies, 16 of which were managed (Figure 1). All feral colonies were in cavities of old Eucalyptus trees, mainly in E. cladocalyx, E. camaldulensis and E. tricarpa. Nests were approximately in proportion with the availability of old trees of these species (data not shown).

In 2018, 31 colonies were found in 1.34 km2, and the estimated density was therefore 23.1 colonies/km2. The 2019 survey covered a larger area, around two-thirds of which was suburbia, and located 81 colonies in 7.65 km2, with an estimated density of 10.59 colonies/km2 for the area searched.

3.1 Drones contributed from known colonies

In 2018, 14 (61%) of the 23 known colonies were represented by a total of 115 drones in the sample. In 2019, 23 (68%) of the 34 known colonies were represented by 177 drones in the sample. The remaining 9 (39%, 2018) and 11 (32%, 2019) known colonies were not represented in the drone samples (Figure 2). The proportion of colonies that were not represented did not differ significantly between the years (χ2 = 0.39, p = 0.82). Combined, 35% of known colonies did not contribute any drones to the sample.

Figure 2.
figure 2

The distribution of 23 colonies in 2018 and 34 colonies in 2019 with known queen genotypes that contributed zero (red), ≤ 5 (yellow), and ≥ 6 (green) drones. Blue: the DCA.

3.2 Drone contribution and hive activity over the years

For the 19 known colonies that were sampled in both years, pairwise comparison showed that there was no significant difference between years in the mean number of drones contributed to the sample at the DCA (2018: 5.0 ± 1.8 drones; 2019: 5.2 ± 1.9 drones: t = 0.13, p = 0.90, df = 19). Of the 19 colonies, 13 (68%) were occupied by a new queen in the subsequent year.

Contribution of drones in one year did not predict whether a colony would contribute drones in the other year (χ2 = 0.17; p = 0.68, n = 19) or how many drones a colony contributed (χ2 = 0.09; p = 0.76, n = 19). The results from 2018 and 2019 were subsequently pooled to analyse the effect of distance and relative colony size on the number of drones contributed. For the 19 hives that were assessed both years, there was a significant correlation in relative colony size between the years (r = 0.64, p = 0.01).

3.3 The effect of relative colony size or distance on the contribution of drones

Neither colony size nor distance to the DCA improved the model to predict whether a colony contributed at least one drone to the DCA (binary logistic regression analysis; colony size: p = 0.76; distance to the DCA: p = 0.90; n = 50 colonies), and there was no interactive effect (p = 0.98).

For the 36 known colonies that contributed at least one drone to the Williams trap, the number of drones contributed was not correlated with colony size (p = 0.67) but was negatively correlated with the distance of the colony to the DCA (r = 0.36, p = 0.02, Figure 3), and there was no significant interaction between size and distance (p = 0.88). Of all drones caught from known queens, 50% (n = 146) came from colonies located within 0.42 km of the DCA (Figure 3). At distances of less than 0.5 km from the DCA, the colonies that contributed at least one drone to the sample contributed an average of 15.7 ± 4.8 (n = 14) drones, but this dropped to an average of 3.6 ± 0.4 (n = 22) at distances over 0.5 km. The nine colonies that were located at distances of more than 0.9 km from the DCA all contributed 5 drones or less (Figure 3).

Figure 3.
figure 3

The number of drones captured from honey bee colonies of known queen genotype and the distance (km) from the colony to the drone congregation area (DCA) in 2018 (blue/circle points) and 2019 (orange/diamond points). The red shaded area highlights the colonies that contributed less than five drones. Fifty per cent of the drones captured from known colonies were within 0.42 km of the DCA (yellow area).

3.4 Using the sibship assessment from COLONY to assess the colony density

For 2018, COLONY inferred 275 colonies from 1566 drones and in 2019 COLONY inferred 332 colonies from 1484 drones. The average size of the reconstructed sibships was 5.7 in 2018 and 4.5 in 2019. Assuming that drones are caught over an area of 44 km2, this would equate to a density of 6.25 colonies/km2 and 7.54 colonies/km2 respectively.

4 Discussion

Our findings place some caveats around the use of drone samples from a DCA to assess honey bee colony density. Importantly, we found that the number of drones per colony caught in a Williams trap decreases significantly with distance to the DCA. This indicates an increase in non-detection error with distance. This could theoretically be overcome by increasing the drone sample size to achieve adequate average representation of colonies at the furthest possible distance, i.e., 3.75 km from the trapping point (Utaipanon et al. 2019b), and a substantially larger drone sample would be required than currently suggested. Importantly, these results throw into doubt the accuracy of previous density assessments based on much smaller drone samples in areas with moderate–high colony density (5–20 colonies), such as found in our study area (e.g., Moritz et al. 2007; Jaffe et al. 2010).

Apart from a relatively low contribution of drones per known colony at distances of more than 0.9 km from the DCA, and despite large sample sizes, a high proportion (35%) of the known colonies was not represented in the drone sample. Therefore, the assumption that all colonies within 44 km2 contribute at least one drone to a sample taken from a DCA is not supported by this study. Theoretically, absence in the drone sample can be caused by a lack of drone production or by a lack of drone capture (also referred to as non-sampling error), but in practice, it is difficult to distinguish between the two. However, as the frequency of colonies that did not contribute drones to the sample did not increase with distance to the DCA, it seems likely that at least some of the colonies were not producing drones at the time we sampled. Whether and how this affects the accuracy of the assessment of the number of sibships among the drones captured at the DCA should be explored further.

It is possible that colonies that do not produce drones are weak and unhealthy (Utaipanon et al. 2019a). This hypothesis is not supported by our finding that there was no correlation between relative hive strength and number of drones contributed by known colonies. Admittedly, our measurement of colony size was not perfect, because it relied on counting forager activity, which is an indirect assessment of strength. Also, for one of the years there was a seasonal difference in the timing of the assessment of relative hive size and drone sampling. Despite this, our confidence in these indirect measurements is encouraged by previous studies that show the significant relationship between forager activity and colony size (Danka et al. 1986; Beekman et al. 2004). In addition, the significant correlation in the relative activity of colonies present in both years indicates these measurements are meaningful across time.

The assessment of drone contributions may have been influenced by recent replacement of the queen in our feral colonies, either through swarming or supersedure. This is particularly the case during the brief period that drones from the previous queen remained active at the DCA. Out of our known colonies that were sampled in both consecutive years, 68% had replaced their queen. Combining samples taken at different dates over the swarming season or at longer time intervals could therefore lead to an overestimate of colony densities, and further research is needed to address how queen supersedure and swarming influences a colony’s representation to a drone sample especially considering that the proposed time to sample drones coincides with peak swarming season.

A total of 21 drones were removed from analysis because they had two potential mothers in the pool of known queens. While this did not affect the significance of our results, as all but two of these known queens were already confirmed mothers of other drones, this result re-emphasizes the uncertainty of inferring sibships of drones with shared alleles. It is possible that drones from unique but related colonies lead to underestimation of inferred colonies. This is because related drones, such as cousins, share alleles and are more likely to be falsely assigned into one single sibship. Theoretically, this source of underestimation could be rectified with larger sample sizes and more markers, especially for an area with a high density of honey bee colonies. This would decrease the likelihood that non-brother drones would share all or most alleles.

The inferred densities from our drone samples were consistently lower (3.8 and 1.4 times) than those based on visual surveys. Visual surveys have been considered inadequate because of the difficulty in locating all feral colonies, and we agree that this is an important shortcoming. Visual density estimates, however, are a reliable method to determine the lowest possible density of colonies in a homogenous habitat. The finding that our visual estimates were higher than the inferred densities may have been the result of underestimates caused by non-detection and non-sampling errors. However, we note that the landscape covered was not uniform. and our survey (with a few exceptions) did not cover colonies in urban backyards or in steep forested terrain. We estimate that two-thirds of the 44 km2 surrounding the DCA is urbanised. It is likely that the density of colonies in suburban un-surveyed areas was substantially lower than in the surveyed Waite arboretum, which contains many old Eucalypts trees with suitable nesting hollows. This is seen in our visual density estimates, which halved when the survey expanded to cover more urbanized area. However, we do not know how much of this difference is caused by lack of nesting hollows versus lack of opportunities to survey for both occupied hollows and managed hives in backyards. Hence, there are two, non-mutually exclusive explanations why the estimated density from captured drones was lower than that from our visual observations: (1) the density of colonies is likely to be substantially lower in the area over which the drones are attracted to the Williams trap compared to the area visually surveyed by us, and (2) to avoid non-detection errors in the COLONY inferred-sibships over a radius of 3.75 km, the number of drones captured should be much larger than the ~ 1500 used in this study.

When visual inspection is done in areas that contain suitable nesting hollows and then extrapolated to unsuitable areas, this can lead to an overestimate of colony density in the whole of the landscape (Oldroyd et al. 1997). It could be argued that this would lead to a consequential overestimation of the ecological impact and pollination services derived from feral honey bee colonies. While this would be correct for the pollination services in crops, averaging the ecological impact over the whole of a landscape does not make sense as the distribution of foraging workers is determined by the availability of floral resources, which are generally higher in the areas that contain substantial nesting hollows. This implies that it may be necessary to use different spatial scales when assessing feral colony density in relation to economic and ecological impact.

The negative correlation between the distance of the colony to the DCA and the number of drones contributed to a sample suggests that non-detection errors would increase for colonies further away from the DCA. It may be possible to correct for the changes in non-detection error, using a probability function of distance (Arundel et al. 2013). This requires further detailed quantification of the effect of distance at a wider scale, as this study did not sample the entire area that the density estimate is predicted to encompass, and our drone samples, while substantial, were not large enough for the density observed. Furthermore, the possibility needs to be considered that the area over which drones are attracted to a Williams trap may not be a fixed value but may be influenced by the articulation of the landscape, the configuration of trees, the number of suitable sites for DCAs and even by colony density itself. Currently, there is insufficient evidence to assume a single measure for the area of attraction. A relative measure of colony density (high/medium/low) is probably more appropriate.

4.1 Concluding remarks

In conclusion, the comparison of the known colonies to the drone captures indicates that the use of drone samples from a DCA to estimate absolute colony densities is problematic. There are two reasons for this:

  1. 1.

    Diminishing drone contributions with distance from the DCA. This requires further exploration of the sample size required to achieve an accurate assessment of density over the full 44 km2.

  2. 2.

    The non-sampling error. Our finding that > 30% of known colonies did not contribute any drones to the sample caught at the DCA suggests that many colonies in the area will remain undetected.

We suggest the source of the non-sampling error is due to an insufficient sample size and/or changes to drone production from queen supersedure and swarming. Taken together, the influence of distance and colony cycle on the efficacy of the method should be explored further, to assess whether it is feasible to remedy these shortcomings with correction factors for the effects of distance, non-drone producing colonies and changes due to swarming.