1 Introduction

Mechanisms maintaining sociality even in the best-studied social insect species, such as the honeybee (Apis mellifera), are not yet completely understood. Since the seminal paper discussing alternative hypotheses (Keller and Nonacs 1993), the debate has continued for more than 20 years over two conflicting possible explanations of sociality (and specifically, reproductive skew) maintenance in insect societies. Reproductive skew is the asymmetry in the number of offspring produced by the dominant individual (the queen) and the subordinate ones (the workers). In many cases, honeybee being one of them, reproduction is virtually monopolised by the queen. One of the explanations for this phenomenon is worker self-restraint in favour of the reproductive queen, induced by an honest signal of quality produced by the queen (van Zweden 2010). This explanation is also known as “worker control” or “queen signal” hypothesis and is best compatible with the kin selection theory (Hamilton 1972). Its premise is that worker self-restraint and care for the queen are driven by indirect fitness gains. A corollary of this suggestion is that workers would behave differently towards queen providing different gains in indirect fitness. The alternative “queen control” hypothesis postulates that the queen manipulates the workers into sterility and induces them to care for her and her brood without honestly advertising her quality or providing workers with indirect fitness gains. This strategy is predicted to be evolutionarily unstable and to trigger an arms race where workers would develop resistance and queen would constantly develop new means of suppression. Interestingly, the honeybee has been cited as an example for this latter possibility (Katzav-Gozansky et al. 1999; Katzav-Gozansky 2006) and queen mandibular pheromone (QMP) proposed as the candidate suppressive agent (Moritz et al. 2002; Strauss et al. 2008). Other studies, old and recent, however, showed that queen’s state and quality bear an influence on her ability to suppress worker reproduction (Jay 1970; Orlova et al. 2013).

The term “queen quality” (from the workers’ standpoint) can be defined as a combination of all the factors affecting the gains in inclusive fitness that the queen can provide to the workers. It includes first and foremost her reproductive capacity and her relatedness to workers, but also longevity, immune capacity and nutritional status. Many of these aspects of quality are determined during larval development though studies on the effect of developmental conditions on queen-worker interactions produced mixed results (Hatch et al. 1999; Tarpy et al. 2000). Nepotism in the honeybee has also been extensively examined elsewhere with mixed results (Tarpy et al. 2004), and some findings suggest that the recognition template for queen’s relatedness is learned rather than innate (Breed 1981; Page and Erickson 1986). Our study focused on reproductive capacity (i.e. a queen’s fecundity and the sex of her offspring) as a primary measure of queen quality, and we decided to attack this question in the most direct way: by presenting workers with two queens with differing reproductive capacity and observing the workers behaviour towards each queen. To decouple fecundity from mating status and age, we introduced mated queens treated with oviduct blockage that were not capable to lay eggs (reduced fecundity) in addition to virgin queens (unable to produce female offspring).

Our decision to use the novel simultaneous choice setup with live queens, and to introduce the previously unused procedure of oviduct blockage, was guided by the following predictions:

  1. 1.

    Under the queen signal hypothesis, workers, given the opportunity, will prefer queens with the highest reproductive capacity. Intact mated queens will be preferred by workers over any other type and queens with lowered reproductive capacity, either through reduced fecundity or through lack of mating, will resemble each other in (lack of) worker attraction. Under the queen control hypothesis, workers’ preference will be unrelated to queen reproductive capacity and guided solely by the queens pheromonal output.

  2. 2.

    Under the queen signal hypothesis, pheromonal composition of queens will reflect their reproductive capacity and positively affect workers’ choice. Under the queen control hypothesis, pheromonal composition of queens will not reflect their reproductive capacity and will negatively affect workers choice (i.e. workers will seek to escape control).

Each of these two predictions has been partly addressed by previous studies discussed below from which we drew inspiration for the design of our methods and which provide the context for the interpretation of our results.

Simultaneous choice setup we employed in our study was inspired, in part, by a plethora of studies on queen supersedure and duels. Honeybees are a monogynous species and a situation when workers can interact with, and compare between, two queens simultaneously occurs in nature mostly in the context of queen supersedure and queen duels. Queen supersedure is a commonly observed phenomenon in honeybee hives in which workers raise a new queen to replace the old one while she is still alive, and has been extensively studied, though not in the context of queen control/queen signal dichotomy (Butler 1957; Hatch et al. 1999; Tarpy et al. 2000; Tarpy et al. 2004)). This phenomenon in itself suggests that workers are able to monitor queen quality and reach reproductive decisions based on that information. Especially, studies showing increased queen supersedure under adverse conditions such as Nosema infection or pesticide exposure (Furgala 1962; Sandrock et al. 2014), suggest that workers are capable of identifying and removing whatever they perceive as a “low-quality” queen, e.g. an infected queen, or a queen affected by pesticides. During queen duels when several virgin queens are present in the colony, workers also behave differentially towards duelling queens, although factors guiding such discrimination have not been identified (Gilley 2001).

Another aspect of our study, the analysis of chemical signalling, was based on an array of previous findings on honeybee chemical ecology. Chemical signalling is the most important modality through which the queen and the workers communicate, and, as such, it has been extensively studied in a variety of species. Some unanswered questions, however, remain even in this well-studied area. Whether the compounds produced by the queen are honest and informative signals, and whether workers use them to gain information, is a part of the queen control/queen signal debate, and recent studies in other social insect species suggest queen signalling is indeed honest (Smith and Liebig 2017) although the mechanism through which this honesty evolved is as yet unknown. For the honeybee, a lot of emphasis has been put on the queen mandibular pheromone, a blend of compounds in the mandibular glands. While this signal was the first one to be identified and considered a cornerstone of honeybee sociality (Butler 1959; Butler et al. 1962; Butler 1966; Hoover et al. 2003; Hoover et al., 2005a, b), more recent studies identified an array of other compounds originating from different glandular sources, to play a part in the queen’s pheromonal bouquet (Wossler and Crewe, 1999a, b; Katzav-Gozansky et al. 2000; Katzav-Gozansky et al. 2002). Queen acceptance in the context of supersedure has also been linked to chemical signalling (DeGrandi-Hoffman et al. 2007). Esters are produced in the Dufour’s gland of queens and laying workers and are an important part of their chemical profile (Katzav-Gozansky et al., 1997a, b; Dor et al. 2005; Malka et al. 2008). Their role as an egg-marking signal has been debated (Katzav-Gozansky et al. 2001; Martin et al. 2002), but their presence and quantity in the gland was linked to ovarian development (Dor et al. 2005).

A variety of studies were performed to elucidate the influence of pheromones on queen-worker interaction in the honeybee. Studies focusing mainly on mating status showed that queen’s pheromonal output in the mandibular glands and the Dufour’s gland reflects the changes in her physiology (Plettner et al. 1997; Niño et al. 2013) and affects queen-worker interaction (Kocher et al. 2009; Richard et al. 2011). Other studies produced findings to the contrary (Strauss et al. 2008). The present study aimed to complement these findings by simultaneously testing worker preference for live queens of different reproductive capacity, pheromonal output of these queens, and the interaction between them. Queens’ mating status was previously found to affect queen-worker interactions and queens’ suppressive capabilities (Orlova et al. 2013), but direct worker preference for queens and aspects of reproductive potential other than mating status have not yet been examined. We predicted that if workers are consistently able to discriminate between queens and preferentially attend to the more fertile one, then queen’s chemical signals will honestly reflect the queens’ reproductive capacity and predict worker behaviour.

Since in a monogynous species, as the honeybees, workers rarely compare between queens directly and only do so under conditions described above, we believe that workers have an innate template of queen quality that is probably based on specific signal thresholds and instrumental in queen supersedure and queen acceptance. Technically, since the events prior to queen supersedure are not readily observed, it is difficult to test the queen quality hypothesis in full-scale hives. We, therefore, tested it in a reduced system where worker behaviour could be closely monitored. This system also reduced the probability of queen-queen aggression that is common during queen encounters and influences their outcome (Butz and Dietz 1994).

As our study examined reproductive capacity, signalling and worker behaviour simultaneously, our findings provide additional evidence for the interplay between queen quality, pheromonal output and the effect of these factors on worker behaviour, and adds to the knowledge of mechanisms maintaining honeybee sociality.

2 Materials and methods

2.1 Animals

Queen and workers of A.m. ligustica were obtained from the Tzriffin Apiary, a research and development facility of the Ministry of Agriculture located in the coastal plain region in central Israel. Callow workers were obtained from pupae that emerged in the laboratory under controlled temperature and humidity conditions. Virgin queens emerged in the laboratory, whereas mated egg-laying queens were 1-year-old obtained from commercial hives. All workers were sourced from one colony donated by the apiary to Tel Aviv University for research purposes. The workers were unrelated to either the virgin or the mated queens. All virgin queens originated from several matrilines and reared in one colony specifically set up by the apiary for that purpose. Neither the original matriline from which virgin queens originated nor the queen-rearing colony were related to the worker source colony or to mated queens’ source colonies.

2.2 Housing

Experiments were conducted in small cages with transparent Perspex walls that were separated by a queen excluder to two compartments (each 13 cm × 11 cm × 7 cm) each fitted with a comb and housing one queen. Each cage was populated by 60 to 90 workers that could move freely between compartments, while the queens were restricted to their compartment. Contact between the two queens in each pair was thus restricted but not utterly impossible. Equal numbers of workers were introduced in each compartment before the introduction of queens. The experimental cages were kept under controlled temperature (30 °C), humidity (60%) and photoperiod (12L–12D) conditions with pollen and 50% sucrose solution ad libitum.

2.3 Experimental setup

Three queen types were used: virgin queens and two types of 1-year-old mated queens. Mated queens were either left intact or had their oviduct blocked in order to prevent them from egg-laying. For oviduct blocking, the queen was tethered to a rubber block ventral side up, and her oviduct opening was sealed with a drop of biocompatible glue (Opalith bee glue, Graze) to create a solid “plug”. The queen was kept tethered until the glue dried out and then returned to her compartment.

Five group types were established for the experiment: groups where both queens were virgin (hence VV, n = 3), groups where both queens were mated and left intact (hence MM, n = 4), groups where one queen was mated and intact and the other mated with blocked oviduct (hence MO, n = 8), groups where one queen was mated and intact and the other virgin (hence MV, n = 10) and groups where one queen was virgin and the other mated with oviduct blocked (hence VO, n = 8). In total, 33 pairs comprising 26 mated queens, 16 oviduct-blocked queens and 24 virgin queens were used in the experiment. The experiment was conducted for 21 days or until one of the queens died, whichever came first. At the end of the experiment the queens were frozen at −80 °C and stored for further analysis. After the end of the experiment, combs were examined for eggs or larvae. Group composition and behavioural data are summarised in Table I.

Table I Summary of the composition and sample size of different queen pairings, percentage of workers in the most preferred queen’s compartment and measures of preference asymmetry across queen types

2.4 Behavioural observations

Observations were performed twice daily—one in the morning (8:00–10:30) and one at noon (12:00–14:30), for 2 min each time in each cage, for 21 days to allow the full development for virgin queens’ pheromonal signature (total of 42 observations and duration of 84 min for each cage). During the 2 min of observation, workers in each compartment were counted and presence/absence of retinue behaviour and aggression was recorded. If egg-laying occurred, it was recorded as well. Since in some groups the queen died before the end of the experiment, the observation time was shorter than 21 days. For these, the number of observations varied from 6 (3 days) to 40 (20 days). If aggressive interaction with workers were followed by the queen’s death, it was considered a regicide. During each observation, we counted the bees attending to each queen. We recorded the following behavioural parameters: (1) proportion of attending workers in each queen’s compartment throughout the experiment, (2) retinue behaviour when workers surround the queen and antennate her, groom her, feed her and follow her around the cage (presence/absence of retinue behaviour received a score of 1 or 0, respectively), and (3) aggression towards the queen manifested by biting, mauling or stinging attempts (presence/absence of aggression received a score of 1 or 0, respectively). We constructed an index of worker preference to incorporate all the parameters of worker behaviour towards the queen: I(pref) = P + R − A where P is the proportion of workers, R is the retinue score and A is the aggression score. Average index and proportion of workers were calculated for each queen. The queen with the higher index of preference in each pair was termed the most preferred queen, while the queen with the lower index of preference was termed the least preferred queen. We also ranked the queens by the number of workers present in each queen’s compartment. In order to determine how asymmetric worker preference was, we calculated the difference in percentage of workers present in the compartment of each queen: D = 100% × (P(most preferred) − P(least preferred)).

2.5 Dissection and ovary measurements

Queen abdomens were dissected on dry ice. Dissection proceeded as follows: 4–6 sternites were removed from the queen’s abdomen. The abdomen was then allowed to thaw and the ovaries and Dufour’s gland were dissected out. Ovaries were immersed in 200-μl Eppendorf tubes filled with 50 μl of 100% ethanol. Each tube with the liquid was weighed before and after the ovaries were added, and the difference in mass was recorded as the net mass of ovaries. Then, the length of all terminal oocytes was measured. The longest terminal oocyte was recorded for use in further analysis. Dufour’s gland was extracted in 50 μl dichloromethane containing 100 ng eicosane as internal standard as described in Orlova et al. (2013). Heads were dissected on dry ice separately. From each head, both mandibular glands were recovered on dry ice by peeling off the cuticle around the mandibles until the gland was exposed. The gland and the mandible were then separated from the head, the muscles and connective tissues were removed, and then the gland was the gently separated from the mandible. The one destined for chemical analysis was placed in a vial with 50 μl methanol containing 1 μg decanoic acid as internal standard. We decided to exclude queens that were found dead during the experiment and their counterparts from analysis of physiological parameters.

2.6 Chemical analyses of glandular contents

The contents of Dufour’s and mandibular glands of queens were analysed by gas chromatography (GC). Samples for Dufour’s glands content analyses were concentrated by gentle evaporation to the final volume of 3 μl of which 1 μl was injected to the GC (Varian CP 3800) equipped with a VM-5 fused silica column. The injection was performed in splitless mode at a flow rate of 1.5 ml/min with temperature program from 150 °C (1 min of initial hold) at 5 °C/min to 300 °C with a final hold of 10 min. Chemical composition of Dufour’s gland secretion was verified using GC/MS according to ((Katzav-Gozansky et al. 1997; Katzav-Gozansky et al. 1997). A chromatogram with peaks identified by GC/MS was used as a reference. In addition, external standards consisting of a series of synthetic hydrocarbons and esters identical to those found in the Dufour’s gland were run using the same method to serve as a reference and to control for possible shift in retention times. Quantification was achieved by peak integration (Galaxie chromatography software, Varian Inc.), compared to the internal standard (100 ng eicosane). Chromatograms that displayed very low quantities of both the internal standard and the glandular components (n = 3) were considered faulty injections and therefore excluded from further analysis. Altogether, 3 samples were thus excluded, among them 1 mated queen, 1 oviduct-blocked queen and 1 virgin queen. A complete list of compounds in the Dufour’s gland can be found in Katzav-Gozansky et al. (1997) and Dor et al. (2005).

Samples for mandibular gland analyses were evaporated to dryness followed by sylilation using 25 μl of BSTFA. The vials were incubated at room temperature for 2 h with gentle shaking on a lab rotator. After 2 h, the reaction was stopped by adding 200 μl of hexane and the samples thus treated were injected to the GC within 48 h. Samples of mandibular glands were also concentrated by gentle evaporation to the final volume of 3 μl of which 1 μl was injected to the GC. The column temperature was programmed from 90 to 200 °C at a rate of 3 °C/min followed by a rate of 15 °C/min to 300 °C, at which it was held for 10 min. Chemical composition of mandibular gland secretion was verified using GC/MS according to (Malka et al. 2007). A chromatogram with peaks identified by GC/MS was used as a reference. In addition, external standards consisting of a series of synthetic acids identical to those found in the mandibular glands were run using the same method to serve as a reference and to control for possible shift in retention times. Quantification was obtained by peak integration using Galaxie software, compared to the internal standard (1 μg decanoic acid). No mandibular gland sample was discarded as a faulty injection.

10-Hydroxydecanoic acid (10-HDAA), 8-hydroxyoctanoic acid (8-HOA) and 10-hydroxy-2-decenoic acid (10-HDA) were termed “worker-like mandibular substances (WMS)”, whereas HOB, HVA, 9-hydroxy-2-decenoic acid (9-HDA) and 9-oxo-2-decenoic acid (9-ODA) were termed “queen-like mandibular substances (QMS)”.

2.7 Statistical analyses

All statistical analyses were performed using IBM SPSS v.21 software. We present descriptive statistics for specific variables. We used chi-square tests for frequency analyses, e.g. frequency of preference for a certain queen type. We used generalised linear mixed model (GLMM) for analysis of behavioural data. All analyses were followed by a post hoc least significant difference (LSD) comparison. As we mentioned in the above, we decided to exclude the queens that were found dead as well as their counterparts from analysis of physiological parameters. Thus, the sample size for physiological data is smaller than that for behavioural data. Therefore, we decided to use a statistical procedure that does not assume normal distribution of the data. Generalised estimated equation (GEE) procedure was used for this purpose. Also, for all physiological parameters, standardised scores (Z-scores) were calculated and subsequent analysis was performed on these scores. For analyses of behavioural data collected over a number of days, we constructed all models using repeated-measures data structure with pair as a subject effect and queen and day as within-subject effects. For analysis of physiological data, we constructed models using repeated-measures data structure with pair as a subject effect and queen as within-subject effect to account for dependence of parameters of queens within one pair.

3 Results

3.1 Queen type, rank and worker preference

As described in Sect. 2, each queen in a pair was assigned a rank according to an overall index of worker preference and also according solely to the proportion of workers present in her compartment. A total of 33 queen pairs were analysed. The ranking coincided in all pairs but one (a pair of virgin queens, VV2). The mean difference in worker percentage was 68.88% and the median was 73.3%. Only 3 pairs (8.8% of the total) had mean difference of 20% or less, while 9 pairs (26% of all pairs) had mean difference of 90% or more. The minimum difference in attendance was 2%, and the maximum difference was 100% (i.e., the least preferred queen was alone in her compartment). Difference in worker percentage was also analysed separately for each pair type. The results of analysis are summarised in Table I. These results show that in most pairs one queen was preferred over another by a large margin and lack of preference for one queen or another was rare.

To determine whether one queen type was consistently preferred in over the other in each pair type, we conducted a chi-square analysis. Since in MM and VV groups queen type was the same, each queen was randomly assigned a number 1 or 2. In MO and MV pairs, mated queens were consistently preferred over oviduct-blocked and virgin queens, respectively (Pearson χ2 = 4, df = 1, p = 0.046, for MO groups, Pearson χ2 = 7.2, df = 1, p = 0.007, for MV groups). In VO pairs, there was no consistent preference of one queen types over another (Pearson χ2 = 0.1, df = 1, p = 0.317). In MM and VV pairs, any randomly assigned number was not associated with increased preference over another (Pearson χ2 = 2, df = 1, p = 0.157, for MM groups, Pearson χ2 = 0.667, df = 1, p = 0.424 for VV groups). The results of this analysis are summarised in Table I.

The index of preference was analysed using GLMM with pairing type × queen type interaction as a fixed factor and day((pair) pair type)), i.e. day nested within pair ID nested within pair type as a random factor. A total of 608 observations from 33 pairs representing 5 pairing types were included in the analysis. The effect of pairing type × queen type interaction was significant (GLMM, F = 18.04, df1 = 9, df2 = 598, p < 0.0001). Random effect was not significant (Z = 0.195, p = 0.846). Post hoc LSD analysis used queen type as a contrast field. Contrasts were significant for MO and MV groups, but not for VO groups (post hoc LSD, p = 0.044, p < 0.0001, and p = 0.394, respectively) (Figure 1). Contrasts were not calculated for MM and VV pairs since both queens were of the same type. Thus, the results of GLMM analysis coincide with the results of the more conservative chi-square analysis.

Figure 1.
figure 1

Attraction index of different queen types within MO, MV and VO pairs. p Values from GLMM analysis followed by post hoc LSD analysis are indicated above each pairing type. Error bars are based on standard error of each sample. Sample sizes (total observations number) are indicated within each bar.

The parameters comprising the index of preference—proportion of workers in each compartment, retinue and aggression—are found in Supplementary material (Table S1).

3.2 Queen mortality and egg laying

Queen mortality was observed during the experiment. If a queen’s death was preceded by significant worker aggression towards her, the death was considered a regicide. In 2 cases, regicide of a virgin queen was noted during observations, the stinging worker identified and the sting recovered. If no aggression from workers preceded the queen’s death, it was termed a natural death. Killing of one queen by another queen was unlikely since contact between queens was limited but could not be absolutely excluded. Mated queen had the lowest mortality incidence (3 out of 26, 1 possible regicide). Oviduct-blocked queens and virgin queens had higher mortality rates (5 out of 16 and 10 out of 24, respectively. In oviduct-blocked queens, 3 deaths were preceded by aggression and termed regicide, while 2 were considered natural death. In virgin queens, 8 out of 10 deaths were preceded by significant aggression and termed regicides. Differences in overall mortality between queen types did not reach statistical significance (Pearson χ2 = 5.880, df = 2, p = 0.053), but differences in regicide rates were significant (Pearson χ2 = 7.229, df = 2, p = 0.026). No significant worker mortality was observed during the experiment.

Egg laying was observed surprisingly rarely throughout the experiment. Only two out of 26 mated queens laid eggs. Also, surprisingly, two out of virgin queen developed ovaries and laid presumably haploid eggs. All laying queens had higher preference index than their counterparts.

3.3 Physiological parameters and queen type

We compared total ovary mass, terminal oocyte length, esters/HC ratio and QMS amounts across queen types. As we mentioned in Sect. 2, we decided to exclude the queens that were found dead as well as their counterparts from analysis of physiological parameters. Thus, the sample size for this analysis is smaller than that for behaviour analysis. Therefore, we decided to use a statistical procedure that does not assume normal distribution of the data. For this, we constructed a number of GEE models with queen type as predictor variable and each of the physiological and chemical parameters as dependent variable. We used repeated measures data structure with pair as a subject variable and queen as a within-subject variable to account for dependency between queens in each pair. We used unstructured covariance matrix to avoid the assumption of independence between subjects. The complete dataset will all the variables we tested can be found in Supplementary Table S2.

Total ovary mass differed significantly between the three queen types (GEE, n = 34, Wald χ2 = 188.270, df = 2, p < 0.0001, mean = 0.0396 ± 0.002 g for mated, mean = 0.0271 ± 0.003 g for oviduct blocked, and mean = 0.0071 ± 0.0007 for virgin, p < 0.0001 for all post hoc comparisons). Terminal oocyte length also differed significantly between queen types (GEE, n = 34, Wald χ2 = 63.132.270, df = 2, p < 0.0001, mean = 1.08 ± 0.081 mm for mated, mean = 0.875 ± 0.071 mm for oviduct blocked, and mean = 0.145 ± 0.084 mm for virgin, post hoc LSD: p = 0.029 for mated vs. oviduct blocked, p < 0.0001 for virgin vs. mated and virgin vs. oviduct-blocked). Upon examination ovaries of oviduct-blocked queens also showed signs of oocyte resorption.

Esters/hydrocarbons ratio was also significantly different between queen types (GEE, n = 31, Wald χ2 = 10.606, df = 2, p = 0.005, mean = 2.27 ± 0.361 for mated, mean = 0.830 ± 0.280 for oviduct blocked, and mean = 1.31 ± 0.138 for virgin). Post hoc analysis revealed that mated queens differed from virgin and oviduct-blocked queens, but the latter two types did not differ (post hoc LSD, p = 0.001 for mated vs. oviduct blocked, p = 0.02 for mated vs. virgin, p = 0.149 for virgin vs. oviduct-blocked). Mean absolute amounts of hydrocarbons per gland equalled 4.35 ± 1.27 μg, 4.50 ± 1.32 μg and 5.45 ± 1.6 μg for mated, oviduct-blocked and virgin queens, respectively. Mean absolute amounts of esters per gland equalled 9.58 ± 2.33 μg, 2.72 ± 0.86 μg and 6.56 ± 1.73 μg for mated, oviduct-blocked and virgin queens, respectively.

QMS amounts were not significantly different between different queen types, although mated queens showed slightly higher, and virgin queens slightly lower amounts than other types (GEE, n = 34, Wald χ2 = 1.545, df = 2, p = 0.462, mean = 18.6 ± 3.48 μg for mated, mean = 16.5 ± 6.20 μg for oviduct blocked, and mean = 13.05 ± 2.81 μg for virgin). WMS amounts were not significantly different between different queen types, although virgin queens displayed slightly higher amounts than other types (GEE, n = 34, Wald χ2 = 0.231, df = 2, p = 0.891, mean = 3.85 ± 0.7 μg for mated, mean = 3.42 ± 1.4 μg for oviduct blocked, and mean = 4.18 ± 1.3 μg for virgin).

Data on physiological parameters and relevant charts can be found in Supplementary Table S2.

3.4 Physiological parameters and preference

We analysed how physiological and chemical parameters of the queen predicted the worker attraction index. For this, we constructed a GEE model with each one of the physiological parameters as predictor variable plotted as a continuous covariate and attraction index as dependent variable. We used repeated measures data structure with pair as a subject variable and queen as a within-subject variable to account for dependency between queens in each pair. We used unstructured covariance matrix to avoid the assumption of independence between subjects. Both attractiveness index and physiological parameters were transformed to standardised Z-scores and Z-scores were used for analysis. Results of the analysis are listed below.

Both total ovaries mass and terminal oocyte length were significant predictors of preference index (GEE, n = 34, β = 0.424 ± 0.18, intercept = 0.078, Wald χ2 = 5.224, df = 1, p = 0.022 for ovaries mass and GEE, n = 34, β = 0.478 ± 0.22, intercept = 0.044, Wald χ2 = 4.347, df = 1, p = 0.037 for oocyte length). Terminal oocyte length was a slightly better predictor of attraction index than total ovaries mass (∆QICC > 1).

Total ovary mass did not show significant interaction with queen type (GEE, n = 34, Wald χ2 = 5.378, df = 3, p = 0.117). Oocyte length showed significant interaction with queen type (GEE, n = 34, Wald χ2 = 9.902, df = 3, p = 0.019, β = 0.714 ± 0.27, β = 0.575 ± 0.34 and β = 0.335 ± 0.40 for mated, oviduct-blocked and virgin queens, respectively) (Figure 2a). Oocyte length alone was a better predictor of attraction index than oocyte length × queen type interaction (∆QICC > 3).

Figure 2.
figure 2

a Relationship between attraction index and total ovary mass. Different dot shapes represent different queen types. Regression line and 95% confidence intervals are displayed. The lines are derived from linear relationship and do not reflect the repeated measures design and between-subject dependency. Sample size is indicated in the legend. b Relationship between attraction index and terminal oocyte length. Different dot shapes represent different queen types. Regression line and 95% confidence intervals are displayed. The lines are derived from linear relationship and do not reflect the repeated measures design and between-subject dependency. Sample size is indicated in the legend.

3.5 Gland chemistry and preference

Esters/HC ratio in the Dufour’s gland and QMS amounts in the mandibular glands also significantly predicted the attraction index (GEE, n = 31, β = 0.237 ± 0.09, intercept = 0.108, Wald χ2 = 6.733, df = 1, p = 0.009 for esters/HC ratio and GEE, n = 34, β = 1.011 ± 0.40, intercept = − 0.006, Wald χ2 = 5.899, df = 1, p = 0.015 for QMS amounts) (Figure 3a, b). Esters/HC ratio was a better predictor of attraction index than QMS amount (∆QICC > 3). Both esters/HC ratio and QMS amounts showed significant interaction with queen type (GEE, n = 31, Wald χ2 = 8.229, df = 3, p = 0.042, β = 0.276 ± 0.13, β = 0.563 ± 0.35 and β = 1.76 ± 1.9 for mated, oviduct-blocked and virgin queens, respectively, for esters/HC ratios and GEE, n = 31, Wald χ2 = 60.75, df = 3, p < 0.0001, β = − 0.616 ± 0.34, β = 0.387 ± 0.28 and β = 2.13 ± 0.31 for mated, oviduct-blocked and virgin queens, respectively, for QMS amounts).

Figure 3.
figure 3

a Relationship between attraction index and esters/hydrocarbons ratio in the Dufour’s gland. Different dot shapes represent different queen types. Regression line and 95% confidence intervals are displayed. The lines are derived from linear relationship and do not reflect the repeated measures design and between-subject dependency. Sample size is indicated in the legend. b Relationship between attraction index and QMS amounts in mandibular glands. Different dot shapes represent different queen types. Regression line and 95% confidence intervals are displayed. The lines are derived from linear relationship and do not reflect the repeated measures design and between-subject dependency. Sample size is indicated in the legend.

3.6 Physiological parameters and gland chemistry

We also tested whether physiological parameters predicted pheromonal ones. We found that terminal oocyte length predicted the esters/HC ratio as expected (GEE, n = 31, β = 0.394 ± 0.18, intercept = 0.014, Wald χ2 = 4.569, df = 1, p = 0.033) but total ovary mass did not (GEE, n = 31, β = 0.263 ± 0.18, intercept = 0.074, Wald χ2 = 2.042, df = 1, p = 0.153). Terminal oocyte length did not predict QMS amounts (GEE, n = 34, β = 0.24 ± 0.22, intercept = − 0.079, Wald χ2 = 1.105, df = 1, p = 0.293). Surprisingly, total ovary mass significantly predicted the QMS amounts (GEE, n = 34, β = 0.914 ± 0.27, intercept = − 0.159, Wald χ2 = 11.15, df = 1, p = 0.001).

4 Discussion

The objective of our study included testing worker behaviour towards queens of different type and reproductive capacity, analysing the relationship between reproductive parameters and pheromonal output and the relationship between queens’ pheromonal output and worker behaviour. We predicted that

  1. 1.

    Under the queen signal hypothesis, workers, given the opportunity, will prefer queens with the highest reproductive capacity. Intact mated queens will be preferred by workers over any other type and queens with lowered reproductive capacity, either through reduced fecundity or through lack of mating, will resemble each other in (lack of) worker attraction. Under the queen control hypothesis, workers’ preference will be unrelated to queen reproductive capacity and guided solely by the queens pheromonal output.

  2. 2.

    Under the queen signal hypothesis, pheromonal composition of queens will reflect their reproductive capacity and positively affect workers’ choice. Under the queen control hypothesis, pheromonal composition of queens will not reflect their reproductive capacity and will negatively affect workers choice (i.e. workers will seek to escape control).

Overall, our results matched our predictions under the queen signal hypothesis. Workers indeed showed increased preference towards queens with higher reproductive ability (similarly to previous findings on the subject (Kocher et al. 2009; Orlova et al. 2013; Rangel et al. 2016) and did not discriminate between reproductively impaired queens, regardless of the cause of such impairment (as illustrated in Figure 1). Overall, these findings suggest that workers are able to perceive the reproductive quality of the queen and identify a queen promising larger or smaller gains in their inclusive fitness. The fact that preference index was significantly predicted by both ovarian mass and oocyte length, as illustrated in Figure 2, further corroborates this idea.

Use of oviduct blockage proved itself as a method to decouple reproductive capacity from mating status (and possibly simulate failing queens). Oviduct-blocked queens indeed showed decreased ovary mass, decreased terminal oocyte length (p < 0.0001 and p = 0.029 for comparison with intact mated queens) and signs of oocyte resorption. One might argue that oviduct-blocked queens were rejected by workers on account of their poor health. If it were so indeed, the results would still corroborate our prediction that workers are sensitive to the queen’s quality. However, our findings on ovary mass, terminal oocyte size and oocyte resorption, and the fact that mortality in oviduct-blocked queens was comparable to that in intact virgin queens, indicate that the treatment impaired their reproductive capacity rather than other aspects of their physiology.

More importantly, Dufour’s gland content in oviduct-blocked queens was indistinguishable from that in virgin queens but differed from that mated queens (p = 0.149 and p = 0.02, respectively). This finding matches our second prediction: if workers are able to discriminate between queens, the queen’s pheromonal signatures are honest signals of their reproductive capacity and likely provide the basis for that discrimination. Indeed, terminal oocyte length was a significant predictor of esters/HC ratios (p = 0.033), and these ratios, in turn, were significant predictors of worker preference (Figure 3a). It suggests that esters/HC ratios are an honest signal of fertility, specifically of readiness to lay eggs, in queens, and it is through them that workers assess the reproductive capacity of a queen. This finding complements earlier findings showing that insemination quality and quantity (bearing a direct effect on egg production) affect the composition of the Dufour’s gland and thus influence worker behaviour (Richard et al. 2011).

Curiously, QMS amounts, while not being linked to oocyte length and not differing significantly between queen types, were significantly predicted by ovary mass and did, in turn, predict worker preference positively (Figure 3b), i.e. workers were not trying to avoid queens secreting more of a “controlling” substance. While the latter fact is not surprising, it further corroborates the idea that worker behaviour is informed by chemical signalling, and QMS amounts in the honeybee are apparently a caste recognition signal, rather than a fertility signal. This finding is in agreement with earlier studies on the subject (Malka et al. 2007; Malka et al. 2009). The fact that QMS amounts are predicted by ovary mass seemingly contradicts earlier findings on the subject (Peso et al. 2013). The source of this contradiction might lie in the fact that our study analysed interindividual variation between queens rather than compared between groups.

The finding that ovary mass and terminal oocyte length is linked to different signals alludes to the fact that these signals reflect different aspects of quality. Ovary mass in the honeybee is determined by developmental conditions and results of an array of influences during a female’s larval development (Linksvayer et al. 2011; Gajger et al. 2017). It can be stated the ovary mass of the honeybee females is determined by the same factors as the female’s caste itself. Therefore, it is expected that a caste-related trait would be linked to a caste-related signal (Plettner et al. 1993; Plettner et al. 1996) although in workers the signal is apparently plastic, but not expressed to the full extent as in queens (Malka et al. 2007).

Dufour’s gland esters, on the other hand, show all the properties of an honest signal of fertility that transcends caste and is related to changes occurring in the female honeybee’s body during her adult life. This signal is dynamic and changes in response to changing conditions even on a short time scale (Richard et al. 2011).

It is to be noted, however, that while correlative evidence in this and other studies links Dufour’s and mandibular gland secretion to various aspects of queen physiology, the mechanistic nature of this link is yet unclear and further work is required to elucidate the connection between pheromones and physiological traits, and how this connection is relevant to queen-worker interaction. .

Overall, our findings demonstrate that both caste-related and fertility-related signals are instrumental in directing worker behaviour towards queens. This result is in agreement with an array of previous findings cited above and adds to our understanding of mechanisms governing bee sociality. Also, our results suggest that both Dufour’s gland esters and QMS are honest indicators of queen quality, albeit different aspects of it.

Overall our results support the “queen signal” rather than “queen control” explanation of bee sociality maintenance since workers in our study were able to accurately perceive queen quality rather than being manipulated into caring for a low-quality queen and signals produced by the queen, that inform worker behaviour, honestly reflect different aspects of queen reproductive potential, rather than being redundant means of control. However, further work is needed to better understand the mechanisms responsible for synthesis of chemical signals, the neural basis of their perception in workers and the link between the signals and the qualities they advertise. We believe that our study can provide some insight into the still poorly understood aspects of bee sociality.