1 Introduction

The management of honeybees (A. mellifera Linnaeus, 1758) is based on economic as well as ethical decisions. Ever since humans have been managing honeybee colonies, they must take responsibility for their colonies and choose Beekeeping Management Practices (BMPs) that ensure both the health of the colony and the productivity of the hive (Sperandio et al. 2019, p. 3). The selection of BMPs is very important for the overwintering success of the colonies (Jacques et al. 20162017) Since the nineteenth century, beekeepers try to influence population dynamics of honeybee colonies to make them grow rapidly and reach high numbers of individual adult bees during the foraging season. Three core traits of honeybee colony dynamics are brood size, adult worker bee population and honey reserves (Requier et al. 2017, p. 1162). With these traits, we can describe the colony dynamics, which seem to depend largely on the egg-laying rate of the queen, much influenced by its age, and the infection of individual bees with parasites and viruses. This has been shown in mathematical models, which describe the colony development incorporating these factors (Khoury et al. 2011, 2013) or adding disease and foraging dynamics (Becher et al. 2014; Horn et al. 2021) and seasonality (Messan et al. 2021). Colony dynamics and the quantity of in-hive products (bee bread and sugar feed stores) can be measured in the field by a ‘visual estimation of comb surface covered by adult bees and brood cells’ (Requier 2019, p. 6).

In this experimental approach, we focus on the impact of the BMP of separating brood and honey spheres in the hive using queen excluders (QEs) as selective barriers. These devices consist of a metal or plastic grid, wide enough for workers to pass but too narrow for the queen and drones. They separate the honeycombs from the brood nest of the colony, preventing the queen from laying eggs in honeycombs (Crane 2009a, p. 69). In feral or wild honeybee colonies, honey (and pollen) is stored above and alongside the brood nest, i.e. honey, pollen and brood cells share the combs in the colonies centre (Seeley and Morse 1976, p. 500). Several people are presumed to have developed and applied queen excluding devices: Peter Iwanowitsch Prokopowitsch (1775–1850) (Geiseler 2011, p. 42 f; Schade 2012, p. 220), Abbé Collin (Collin 1875 cited in Crane 1978) and Friedrich August Hannemann (Whyte 1919 cited in Crane 1999) among others.

The use of QEs is a standard in modern beekeeping. Aims include i.e., to strictly separate brood from honey frames (Crane 1978, 2009b, p. 69), harvest honey from scarce floral resources, rear queens in queenright colonies, keep colonies with double queens or limit brood production (Geiseler 2011). While the addition to the hives is commonly used because of expected advantages in colony management, their effects on the colony, honey quality and labour organization and economy are still intensely disputed.

Up to now, there is no scientific evidence that QEs disturb the colony development or have any other effects (Garrido and Nanetti 2019, p. 77). But, the effects of QEs have not been studied in a systematic approach yet. Critics assume that honey quality is lowered by using the device. They argue that climatic conditions in the supers, i.e. top combs where the honey is stored, are altered by the separation from the brood sphere (Gerstmeier and Miltenberger 2018, p. 218). Brood production requires temperatures of 34–36 °C (Stabentheiner et al. 2021). Lower temperatures and the absence of nectar-manipulating worker bees are assumed to hinder the ripening process of the honey, when bees are forced to store the honey outside of the brood nest (Bretschko 1985, p. 61; Lampeitl 2009, p. 38; Rindberger 2020, p. 14 f.).

QEs are commonly used even in certified organic beekeeping as defined in EU regulation 2018/848 (European Union 2018). Biodynamic beekeeping as specified by the Demeter association in its standards is characterised by (i) allowing the colonies to build natural honeycombs and an undivided brood area; (ii) using the process of swarming as a basis for reproduction, growth, rejuvenation and breeding and (iii) using its own honey (min. 10%) for supporting the colony through the winter. The Demeter association restricts the use of QEs in its standards, due to concerns regarding colony dynamics and honey quality. According to these regulations, they may only be used exceptionally upon formal request. (Demeter-International eV 2017).

As the assumed negative effects of the QE have been questioned within the German Demeter beekeepers’ professional group, a transdisciplinary participatory research project was carried out from 2018 to 2021, aiming at generating the first scientific insights into the effects of QEs on honey quality (2018–2019), colony dynamics (2020) and labour organisation (2018–2021). In a beekeeper-driven on-farm experiment, the population dynamics of 32 hives managed with and 32 hives managed without QE were assessed along with indicators of honey quality and labour economy. In this paper, we report the results of the investigation of colony dynamics.

2 Materials and methods

2.1 Experimental setup

The study was carried out in Germany as a transdisciplinary participatory on-farm experiment (Grunwald et al. 2020). In 2018, eight beekeeping operations provided eight honeybee colonies each from their stock. Out of these 64 experimental units, half of the colonies from each apiary were equipped with a queen excluder (QE), resulting in a total of 32 colonies with and 32 colonies without an excluder. Hives in the apiaries were managed according to the treatment (with/without QE) over 4 years. In 2018 and 2019, the effects of the QE on honey quality were studied (Geier et al. in prep.). Colony dynamics were studied only in 2020 due to financial restraints. By then, the colonies were in the 3rd year of differentiated management with the following exception: After the second year, one beekeeper left the project group and was replaced by another one. By doing so, eight beekeepers with each four hives with and four hives without QE were participating in each year of the project period.

The participating beekeepers were characterised by their membership in the Demeter organic farmer’s association and their willingness to cooperate in the project. As we aimed at representing beekeepers who are generally critical of using the barrier and others with a more positive attitude towards to device, the beekeepers were specifically selected according to their general attitude towards the device with the aim to reach a balance. The operations were (with one exception) distributed in the southern and eastern parts of Germany. For all of them, beekeeping contributed to the family income (commercial businesses).

All colonies were managed in the years before and during the experiment according to the Demeter regulations for beekeeping (Demeter eV 2022). The setup of the experimental groups (with/without QE) within each apiary was based on colony development in the first year: Beekeepers chose the first eight colonies in an economically used apiary that received a honey super in spring, thus selecting the best-performing healthy and queenright colonies. These colonies were alternatingly assigned to the treatments, thus achieving a balanced distribution of adult worker numbers in the colonies between the treatments. The genetic origin remains unknown because biodynamic beekeepers generally rely on local mating. In Germany, A. mellifera carnica is the most widespread subspecies. The specific management of the colonies occurred according to the general management practices of each beekeeper, as the experiment should study the effects of the QE against the background of real-world diversity in biodynamic beekeeping. Common to all experimental hives was that for 2 years prior to and within the data collection phase, no splits or queen replacement occurred; natural swarming or requeening was accepted. Health data especially on varroa infestation rates was not available. Colony replacement before the data collection phase rarely occurred to maintain a balanced design (cf. Table IV in the Appendix). The specific type of the QE employed was not specified, but it had to provide for the bee space between the grid and upper frames.

2.2 Method for the assessment of colony dynamics

For data collections, we used the visual Liebefeld estimation method, which under field conditions proved to be an easy, cost-efficient, reproducible and standardised method (Wille and Gerig 1976; Gerig 1983; Dainat et al. 2020). The estimations were repeated on five dates between April 6 and October 16 in 2020 (Table I). Due to phenological shifts, the days for each estimation period ranged within 5–11 days.

Table I Estimation periods of the field experiment for all beekeepers

The total number of observations in the dataset is N = 320 (64 hives on five dates each). On each date, two scientists (experienced beekeepers themselves) visually determined the number of adult worker bees, worker brood cells, drone brood cells, empty cells, pollen cells and feed/honey cells in the brood chamber and in honey supers, if present. Note that the parameters are not independent from each other, as all parameters of the colony development come from the egg-laying activity of one queen. Additionally, they rated pollen stores and brood cells in honey supers as present/ not present. Further, specific cells, which were left empty in the centre of honey frames, were recorded this way. Beekeepers not only in our project group often observed this phenomenon of ‘provident brood cells’ and explain the behaviour as a provision for future brood production. This is an explanation from practitioners; no scientific reference for this behaviour exists to date.

One researcher surveyed the two apiaries in the northern/eastern part of Germany, while the other researcher surveyed the remaining six apiaries in southern Germany. The situation in the hives at each date was documented by photographs of both sides of each frame in the brood chamber and of brood frames in honey supers. As the estimations were made during daytime, the number of returning bees per minute was counted to provide a measure of the number of foraging bees that had not been present inside the hive during data collection, supposing that one homing bee per second equals 2000 bees foraging (Aumeier 20082010).

2.3 Data preparation

The assessment of the comb area dedicated to each of the parameters was measured in units (one unit usually corresponds to one-eighth of the total area of the respective frame) and documented in an estimation protocol according to Imdorf and Gerig (1999); in addition, notes were made directly at the apiary, e.g. on the order of the assessment. The number of units for each of the recorded parameters was converted into absolute numbers by multiplying with the corresponding area size of a unit (dm2) and literature values for the number of brood cells or adult bees per area (Aumeier 2016) (see Table II). The estimations for each apiary and date were performed according to DBJ (2016).

Table II Recorded parameters (Liebefeld estimation method)

Other data taken from the beekeepers' records were (i) whether the colony was already equipped with an excluder before the experimental year 2020 and (ii) the honey yields of the year 2020 with the respective harvest dates. From the aggregated data, 14 derived parameters were calculated, either because they belong to the Liebefeld method (Imdorf et al. 2008, p. 71 ff.), or because the beekeepers in the project group showed a special interest in the special distribution of cell contents and the cell utilisation rates within the hives. The aggregated number of units was therefore calculated separately for (i) the total hive (= all frame sides within the hive, including brood chamber and supers) and (ii) the brood space (= frames directly available to the bees in the brood chamber, i.e. without the frames behind the follower board. The follower board is a hive addition which separated the brood chamber vertically into two parts to adjust the available space to the size of the colony. The specific boards widely used in German beekeeping and by our project beekeepers let bees pass at both sides and bottom of the separator board, so that they can reach spare food stores during winter/spring or use additional wax foundations when growing). Table III includes the definition and references for the derived parameters. Absolute values are published in the dataset in Online Resource 1.

Table III Derived parameters (calculated from recorded parameters)

After data collection, the processing and analysis of the data were performed using Stata 16 (StataCorp 2019). First, measurement errors were screened out. Measurement dropouts, i.e. data points for which colonies failed to deliver data, were registered: Five colonies were entirely excluded from the analysis (ID = 12, 17, 42, 43, 60) because they could not be estimated on more than one consecutive estimation date. They were too small or queenless before the end of the experiment and beekeepers had to take them out in order to prevent them from dying. Colonies that could not be estimated on date 1 or date 5 were excluded only for the respective date (ID = 16, 35, 41), in order not to lose too many experimental units for the remaining dates. For the same reason, the missing data were imputed (ID = 1, 10, 15) by supplementing the mean value from the previous and subsequent measurement when values were available before and after single missing data points. This occurred especially if it was not possible to assess the colony on date 2 because beekeepers did not want to disturb the mating flight of a young queen. A graphic representation of this is available in the Appendix (Figure 8).

2.4 Descriptive analysis

For a descriptive analysis, each parameter was presented as a boxplot. For each date, the distribution of the measured values was shown separately for the colonies with QE and the colonies without QE. The boxes reflect the 25% quartile, the median and the 75% quartile and thus represent the median 50% of all measured values.

Then growth curves of the individual characteristics were calculated, which depict the average change compared to the first estimation date. For this purpose, the values of the first estimation date were standardised to 100%; all other measurements are presented in relation to this reference date. This form of standardisation enables better comparability of differently sized and equipped colonies at the beginning of the survey period. The values of the variables standardised in this way range between 0 and ∞%. If parameters were already available as proportions (nurturing load, share of brood cells in brood space, share of used cells in the brood space, share of cells used for brood/supplies in the brood area), the proportions were presented directly instead.

Ninety percent confidence intervals were calculated from the standard errors of arithmetic means. If the respective confidence intervals do not overlap for an estimation date, a statistically significant difference between the two groups was assumed to exist. Due to the comparatively low number of cases, we decided for a 10% significance level (= 90% confidence interval). The curves show the growth of the respective parameter for the experimental and the control group.

The analysis of growth plots was then carried out separately for high and low values of other parameters of the dataset (‘3-way analysis’), because we wanted to make sure that possible effects would not be masked by the interaction with other factors, e.g. the size of the colony, the order of the assessment on each estimation date or the duration of the exposure to the treatment (QE). Central variables were divided into two groups along the median, so that two groups of equal size emerged. The effect of queen excluders on the number of adult worker bees was for example determined separately for colonies with a high and a low nurturing load. The aim of this analysis was to detect effects that may cancel each other out: if the QE had a positive effect on the number of bees for colonies with a high maintenance load, but a negative effect for colonies with a low maintenance load, these two opposing effects would compensate for each other in the previous growth plots and thus be overlooked.

2.5 Exploratory factor analysis

Subsequently, two exploratory factor analyses were performed (Taherdoost et al. 2022). The first one was calculated only from recorded parameters and the second one additionally with the derived ones. In the first factor analysis, the total number of adult workers was removed in an iterative process due to cross-loadings on several factors. In the second analysis, honey harvest and worker brood (cross-loadings) and drone brood and pollen (singular factors) were excluded. With a Bartlett’s test for sphericity with p-values of 0.00 and 0.00 and the KMO test with 0.74 and 0.57, the basic requirements for both factor analyses were met. The common varimax rotation was not used. Instead, an oblique-oblimin rotation was carried out, since it could not be assumed that the derived parameters were uncorrelated. From the first analysis, three factors emerged after the rotation, and in the second analysis, with all derived parameters, six factors emerged. With these factors, the previous analyses (box plots and growth plots) were calculated again.

2.6 Concept analysis: Pugh decision matrix

To ensure that a trend in the effects of the QE was not masked by evaluating exclusively aggregated data from all beekeepers and a pattern in the results of the individual beekeepers would remain undiscovered, we qualitatively interpreted the results using a Pugh decision matrix. This multi-criteria decision-making tool from the field of concept selection is based mainly in engineering design and has been proposed by Pugh (1981) and applied and developed since then (Tam et al. 2004; Frey et al. 2009; Guler and Petrisor 2021). We interpreted the treatments as two management options, comparing the one ‘with QE’ against the base concept (‘without QE) for each specific beekeeper. For this purpose, all 120 growth curves were assigned to three categories (+1 = higher growth curves for hives with QE; 0 = similar curve progression for both treatment groups; −1 = lower growth curves for hives with QE) by the team of estimators and visualised in a matrix. Since two of the parameters (‘number of empty cells’ and ‘nurturing load’) could be assessed as rather negative for the beekeeper’s (economic) objectives, they were taken out of the evaluation in a second assessment cycle and assigned to the respective opposite category (thus considered inversely) in a third cycle.

3 Results

The colony structure at the beginning of the monitoring (April) was composed of 10.005 ± 3097 adult individual worker bees, 14,801 ± 5787 worker bee cells, 454 g ± 328 g of pollen and 3794 ± 2216 g of stored honey (mean SD). Bar charts with absolute values for all parameters are available in Online Resource 2.

3.1 Descriptive statistical evaluation at group level

The group medians of the experimental group (‘with queen excluder/ with QE’) and the control group (‘without QE) do not differ significantly even at a low significance level of 10% (p > 0.1), as confidence intervals for all recorded or derived parameters at any of the estimation dates do not overlap. Figures 1 and 2 show the results for adult workers and worker brood, charts for other parameters are available in Online Resource 2.

Figure 1.
figure 1

Growth graph of adult worker population of Apis mellifera L. in relation to first estimation date (in %). The blue curve shows mean values of 32 experimental hives with queen excluder (QE), the orange curve shows results for 32 hives without queen excluder. N, number of observations in the respective treatment group.

Figure 2.
figure 2

Growth graph of the number of A. mellifera L. worker brood cells in relation to first estimation date (in %). The blue curve shows mean values of 32 experimental hives with queen excluder (QE), the orange curve shows results for 32 hives without queen excluder. N, number of observations in the respective treatment group.

There was also no significant difference in the direct comparison of the number of colonies that dropped out of the study for different reasons or of the colonies that had pollen or provident brood cells in the honey supers (Figures 3 and 4). However, 15 of the 32 colonies without excluder showed brood in the honey supers between May 15 and August 7, six of these on more than one estimation date (Figure 5). Thus, colonies tended to breed in the honey chambers without excluders (46.9% in this study). By the end of July, the proportion of top-breeding hives was reduced to 6%. Out of 320 data points, only 19 showed brood in the honey supers, which equals 9.9%, all these observations accumulating in the three estimation periods between May 15 and August 7.

Figure 3.
figure 3

The bar chart shows the IDs of Apis mellifera L. hives which displayed pollen in honey supers. Light grey IDs showed this characteristic on only one estimation date, coloured IDs multiple times. The total number of hives in the experiment was 64.

Figure 4.
figure 4

The bar chart shows the IDs of A. mellifera L. hives which keep empty cells in the middle of honey supers as provident brood cells. Light grey IDs showed this characteristic on only one estimation date, coloured IDs multiple times. The total number of hives in the experiment was 64.

Figure 5.
figure 5

The bar chart shows the hive IDs of A. mellifera L. colonies with eggs or larvae in honey supers. Light grey IDs showed this characteristic on only one estimation date, coloured IDs multiple times. The total number of hives in the experiment was 64.

The 3-way analysis did not yield striking results. As an example, Figure 6 shows the growth plots of the adult worker population in dependence of eight other parameters. For this analysis, the dataset was subdivided into two groups, the 50% of the colonies below or above the median of the respective parameter (worker brood, drone brood, pollen stores, share of used cells within brood area, nurturing load, honey yield, history and order within assessment). All 3-way charts are available in Online Resource 2.

Figure 6.
figure 6

For this 3-way analysis, the dataset was subdivided along the median, i.e. into the 50% of the 64 experimental A. mellifera L. colonies below and above the median of eight parameters (worker brood, drone brood, pollen stores, share of used cells within brood area, nurturing load, honey yield, history and order within assessment). The graphs show the development of the adult worker population in relation to the first estimation date (in %) in dependence of these subdivisions.

3.2 Hidden colony dynamics processes: factor analysis

The results of the exploratory factor analysis (Tables V and VI in the Appendix) did not show any meaningful factor combinations. We considered a factor combination as meaningful when they acted as highly influential and at the same time gave additional information by being able to describe a yet overlooked part of colony dynamics of the colony.

3.3 Concept analysis: Pugh decision matrix

The resulting matrix of rated parameters for each beekeeper revealed that for one single apiary (3), the group with QEs shows predominantly lower growth rates of the parameters (dotted/ − 1); for two apiaries (1 and 2), the group with QEs shows higher growth rates (shaded/1); for four apiaries, the curves that were rated as ‘similar to each other’ predominate (white/0). In the case of one apiary (4), in the first evaluation cycle, no clear assignment could be made; just as many parameters were assigned to the category ‘higher with QE/+1’ and the category ‘similar/0’. When ‘empty cells’ and ‘nurturing load’ were assumed to have a neutral or inverse impact on overall colony performance, the results change slightly (evaluation cycles 2 and 3). Figure 7 shows the condensed results of this method. Overall evaluation for each beekeeper is shown in the bottom row. The single apiary (4), which could not be assigned before, now has a prevalence of higher curves with QE, meaning that now three apiaries belong to this category. The total score per apiary (= sum per column) shows how much the expected advantages concerning each parameter range for different beekeepers: For beekeeper 3, ten out of 15 parameters performed better without QE; for beekeeper 1, the ratio was turned the other way around; and for beekeeper 5, ten out of 15 parameters were similar to each other in treatment and control groups.

Figure 7.
figure 7

The evaluation output for each apiary in the field study, based on the Pugh matrix concept (3rd evaluation cycle): Growth curves were individually assessed for each apiary and parameter. Shaded cells (1) represent higher growth rates for the group with QE, white cells (0) show that no clear evaluation was possible as growth curves were similar to each other and dotted cells (− 1) mean, that the hives with QE showed lower growth rates than the control group. On the right side, the overall evaluation per parameter is shown. The total score is the sum per row. The bottom lines show the overall evaluation per apiary (= sum of per column). We then counted which rating category predominated the overall evaluation.

In the last column, the condensed results for each parameter show that the count of ratings for the two most frequent categories is always close to each other or even the same. Thus, no clear trend for the overall effect for all apiaries regarding a certain parameter can be identified from the decision matrix. The sum of the ratings for each row is positive for eight out of 13 parameters, meaning that overall, the QE favoured their performance. However, growth rates of stores in the brood space and non-empty cells are never higher in hives with QE, and in half of the apiaries, they are lower.

4 Discussion

In summary, we found no significant deviation of group medians for the parameters of colony dynamics between hives managed with and without QEs in eight biodynamic beekeeping companies. An explorative factor analysis did not reveal yet unseen factor combinations. A closer and evaluative look on the results in a Pugh decision matrix showed that for four beekeepers, the colonies with QEs did not differ in most parameters from the control group, while three performed better with and one without excluder.

Evaluating the specific results of an experimental site in a Pugh decision matrix is not a common method in agricultural science. In the agricultural context, it was rather used to determine the best choice between several tool and equipment designs (Seechurn and Boodhun 2018) or policy options (Baležentis et al. 2021). Here, it proved to be a valuable tool for the evaluation of contradicting management practices, which we interpreted as diverging concepts to choose from. When discussing the results in the project group of beekeepers, we found that for individual beekeepers, the importance of single parameters of colony dynamics varies according to their economic goals. We therefore decided not to weigh the parameters, which would otherwise be an important step in this method when applied by individual people, e.g. in a team (Cervone 2009). Accordingly, we are not able to normatively judge the ‘performance’ of the colonies. Such an interpretation can only be done by the respective beekeeper in the context of her/his operation and the social and cultural background. We highlight that for individual beekeepers, this method can be a helpful tool to choose between management concepts. It should be enriched by adding criteria on workload and work organisation, hive product quality and long-term effects on health and survival of the colonies. In this study, it did not show a clear trend for a comprehensive normative evaluation of QEs in general.

The colonies in this experiment started with comparatively small numbers of individuals (cf. Chabert et al. 2021). We rate this as a benefit for the accuracy of our study because in the Liebefeld method, smaller hives increase the precision of the total estimate (Bargen et al. 2020, p. 105). Experimental colonies were varying enormously in size. We took these differences into account by the standardisation method applied. Other environmental factors may also impact our results. Some potentially very important parameters influencing the development of the colonies were not measured, e.g. the incidental values of infestation with the parasitic mite Varroa destructor. However, we assume that potential differences in climatic conditions and health status of the colonies were levelled as we used a comparatively high number of beehives for a field study (cf. European Food Safety Authority 2013) in eight places across Germany.

For economic restraints, we were not able to conduct the estimations every 21 days as intended in the Liebefeld method. Therefore, we were not able to gain any information about the lifespan of adult workers or the daily growth or degrowth rate of the colony. Also, the derived parameters ‘productivity potential’ and ‘brood production’ are only a proxy to total seasonal dynamics and not to be interpreted as absolute numbers (Imdorf et al. 2008, p. 71 ff.). For further studies of QEs or other hive additions in Western Europe, we recommend to strictly stick to the 21-day estimation rhythm and rather end the experiment earlier, as the last estimation date was not very impactful for our research question because colony dynamics become less intense from August onward. It seems more important to cover the entire swarming and main foraging period.

Large numbers of experimental hives require the subdivision of labour among several (in this study two) estimators/researchers. Sophisticated synchronisation methods for the Liebefeld method become necessary. To avoid this, a worthwhile approach would be the use of software for image analysis. Bargen et al. (2020) compared the data generated with photographs to the results of the Liebefeld method and found no significant differences. This confirms earlier results (Imdorf et al. 1987). However, in our study, intersubjective differences in the data collection occurred, and we experienced the benefits and disadvantages of two different procedures: For beekeepers 1 and 2, the hives were evaluated directly in the field. Here, we disturbed the hives for a longer time, therefore the homing bees were counted at the beginning and at the end of the estimation. However, no impact was reflected in the 3-way analysis according to the assessment order. In contrast, for the other six beekeepers, photographs were taken at the apiary and assessed later. Being better documented and less disruptive for the colony, this procedure requires very good lighting and camera equipment. Only for the beekeepers 1 and 2, the empty cells were counted in the honey chambers. The proportion of used comb area could therefore not be calculated for honey supers. For these beekeepers, honey-filled cells in supers as well as in brood chambers were also counted as honey stores, even when they were not fully capped (= fresh nectar). The total proportion of used cells and the stores in the brood space are therefore higher for these beekeepers.

These biases could be avoided by a single scientist estimating all hives. However, Bargen et al. (2020) showed that five individual estimators introduced almost no individual bias to the estimation. The variation between the estimators is generally low, especially for parameters that appear in large numbers like adult workers, worker brood cells or sugar feed stores. Because treatment groups in our study were equally distributed among the two estimators, the impact of the bias is not a problem for the statistical evaluation of standardised values (cf. Bargen et al. 2020, p. 104).

The results of this study are highly robust but derive from only 1 year. Further research would be desirable to expand the dataset. In the project, we found that QEs in fact had a slight but significant impact on honey quality (Geier et al. in prep.), while there was no difference in workload (data not shown). However, we did not study, e.g. workload effects and labour organisation in beekeeping operations with long distances between apiaries. QEs are not guaranteed to always work: Here, the worker bees of one hive seemed to feel queenless in the main swarming period in the honey chambers, despite a well-developed brood nest and a laying queen were present in the brood chamber. Also, young queens pass the grid at times. In a further step, we should complete the dataset focussing on specific BMPs, e.g. whether and how beekeepers use a follower board to adjust the available brood space. Future research should address the question under which management regime hive additions make sense for practical beekeeping. Also, further research is needed to classify BMPs regarding the intensity of the beekeepers’ interventions (Sperandio et al. 2019). First attempts to link BMPs with the beekeepers’ goals have been made e.g. by (Underwood et al. 2019). They show that the selection of BMPs is not random but interlinked with operations size and beekeepers’ attitude towards in-hive chemicals. Even though QEs might not directly influence the egg-laying or foraging activity, the spatial composition of brood-, food- and empty spheres of the hive might still be affected by synergistic effects. However, for a comprehensive analysis of this wider research questions, a much larger database dating from several years would be needed. The development of imaging methods for the spatio-temporal development of the nest structure would be helpful. Further, we completely left out the cultural and socio-economic dimension, which is of great relevance for a holistic assessment of BMP, too. We did not detect significant differences in adult bee numbers or brood development, but in practical beekeeping also slight differences (e.g. the gap in adult workers in May, compare Figure 1) can influence management decisions in the context of the socio-cultural background.

5 Conclusion

The use of queen excluders did not significantly alter the parameters of colony dynamics of biodynamically managed honeybee colonies in this study. Effects of queen excluders differed between the individual beekeeping companies, which may point towards an interaction with other BMPs. A larger database from several years would be needed to clarify this question.

Our study yields results concerning the biological dimension of the queen excluder as a BMP. Yet, with regard to the ongoing discussion about the queen excluder among beekeepers, it must be considered that the decision for or against a BMP is complex, and other dimensions like labour economics, product quality and socio-cultural background may be considered as well.