1 Introduction

Worldwide, there are few organized breeding programs for honeybees (Apis mellifera), with a few notable exceptions, such as in Austria and Germany (Rinderer et al. 2010; Danka et al. 2016; Brascamp et al. 2016; Hoppe et al. 2020). Nevertheless, there is growing interest in selective breeding of the species, resulting in the emergence of new breeding projects. Engaging in selective breeding can enable a group of beekeepers to maintain and develop original stock well adapted to their specific management practices. Breeding focuses usually on enhancing classical traits such as honey production, handling ease, and reducing the tendency to swarm.

In addition to these classical selection objectives, there is a strong emphasis on improving traits related to resilience of the colonies. In farm animals, resilience is defined as the capacity of animals (here colonies) to remain productive when exposed to environmental or infectious challenges (Colditz and Hine 2016). Part of the effort to improve resilience is linked to disease resistance, including via the control of the parasitic V. destructor mite (Rinderer et al. 2010; Danka et al. 2016; Büchler et al. 2020; Guichard et al. 2020a). Furthermore, there is also a growing interest in enhancing feed autonomy of colonies when facing fluctuating feed resources, as feed shortages are common in a wide range of landscapes (e.g., Czekońska et al. 2023). In such cases, feed autonomy is the capacity of a colony to survive and become productive again as soon as feed shortage is over, without needing emergency sugar feeding. Concern over feed autonomy may further be exacerbated by climate change and the reduced diversity of melliferous plants in agricultural landscapes (Decourtye et al. 2010; Goulson et al. 2015). Other reasons for considering feed autonomy as a breeding goal trait include the general intensification of beekeeping (e.g., more fecund stock, several harvests per season); the increase in sugar prices; and stricter controls on honey purity by international dealers that might detect trace amounts of emergency feeding, trying to avoid buying adulterated honey (Ždiniaková et al. 2023).

In the context of emerging breeding schemes and new traits, colony performance records and associated pedigrees are only available on relatively small populations with few generations of trait recording (e.g., Guichard et al. 2020b). Moreover, important traits are usually thought to arise genetically from effects jointly expressed by the queen and its worker group. In consequence, the colony phenotype is usually described by a mixed model containing two random genetic effects: an effect expressed by the queen and an effect of the worker group (Chevalet and Cornuet 1982; Bienefeld et al. 1989). We will refer to this model as the Colony Model (CM). Distinguishing between the genetic effects of the queen and of the workers on colony performance, however, is only possible when the two castes have distinct pedigrees and the dataset is large enough. In datasets from small populations, limited pedigree depth, or containing only records from colonies with open-mated queens that are not used to produce queens for subsequent generations, the CM cannot be fitted. To still obtain information on the heritability of traits, reduced models can be used that include only the genetic effect of either the queen or the worker group of the colony (Guichard et al. 2020b; Du et al. 2022; Basso et al. 2024).

Using a reduced model with only a queen effect (Queen Model: QM), we will present genetic parameter estimates on a small population from an emerging breeding scheme managed by the Association for the Development of Beekeeping in the Provence region (ADAPI) in southern France. Their breeding objective focuses on improving honey yield and resilience to diseases and to fluctuating feed resources, for which a new selection criterion was developed.

2 Material and methods

2.1 Breeding population

During the first 3 years of initialization of the breeding program (2019–2021), the population under study was of Buckfast origin (a synthetic breed), partially hybridized with Caucasian and dark honeybees (for a description of honeybee subspecies or breeds, see Adam (1987) and Ruttner (2013)). Founders were chosen primarily for good honey production, feed autonomy, or V. destructor resilience.

From 2019 to 2021, 14 to 16 breeding queens (BQs) (Table I) were used each year to produce offspring queen families. In total, there were 42 BQs, from which 26 were used to produce offspring queens tested on several apiaries, while the others were private BQs used to produce an offspring sister-group tested in a single apiary.

Table I Number of apiaries, sister-groups, and colonies per year

Genetic links from one generation to the other existed through the dam and sire path and are detailed in Supplementary Figure S1.

Two types of daughters were produced by BQs. The first one was composed of potential new BQs that were inseminated. To allow for a short generation interval, BQs were not phenotyped or selected. The second one was composed of open-mated daughters. Shared open-mated daughters were produced by five (in 2019) or four (in 2020 and 2021) queen producers and mated freely with drones from the area of each queen producer. They were then distributed to the performance-testing beekeepers and will be referred to as shared Testing Queens (TQs). In addition to these shared TQ sister-groups, private TQ sister-groups (i.e., not shared between beekeepers) were open-mated and tested by their owners. TQs from a dam used in 2020 were used as drone-producing queens for the insemination of BQs used in 2021 (see Supplementary Figure S1 for details). All other TQs were only phenotyped and not used for mating.

2.2 Testing protocol

Each year from 2020 to 2022, 13 to 15 beekeepers managed their own testing apiary using the same testing protocol and otherwise maintained their normal production management. Colonies were migratory, except those managed by one of the beekeepers in 2021. Each apiary generally tested three groups of shared TQ sister-groups or at least two. Each shared TQ sister-group was tested on three to five apiaries to ensure a sufficient genetic connection between apiaries. In addition to the shared TQ sister-groups, around half of the beekeepers tested a private TQ sister-group, present only in their own apiary. TQs were introduced in homogenized queen-less colonies to be performance-tested in the following season (Büchler et al. 2013). After winter mortality, requeening, swarming, or colony collapse due to diseases, a total of 1022 colonies remained with phenotypes over the whole beekeeping season, with a minimum of 330 per year (Table I).

Phenotyping during the production season started at the end of winter, in March–April, and ended in mid-summer (mid-August) after the lavender bloom, the main targeted honey flow. Honey yield (HONEY) was measured as the sum of all honey harvested from supers during the whole season. Colonies that produced zero kg of honey but otherwise showed no obvious disease symptoms, requeening, swarming, or other accidental events were considered genetically informative and kept in the analysis.

To record traits and collect samples, trained operators visited all colonies during three periods per year: end of winter (March to April), spring (late April to May), and early summer (June to mid-July). Six traits were recorded at least two times in a year. We used the phenotyping protocols described in Büchler et al. (2013) for the common traits gentleness (GENT), calmness during inspection (CALM), hygienic behavior (HYG), and phoretic V. destructor load (VARROA), with some modifications of the protocols for HYG and VARROA explained in the following paragraph. In addition, we visually assessed the total capped worker brood surface (BROOD) using the protocol defined by Hernandez et al. (2020). The protocol for the newly introduced trait representing honey reserves in the brood chamber (RESERVES) is detailed in Appendix A and aims at evaluating the quantity of honey stored in the brood chamber on a one-to-four scale. As these feed reserves are not harvested, unlike those stored in the supers, colonies could rely on them during honey flow shortages. This new trait was used to assess the potential feed autonomy of the colonies.

For HYG, 50 or 100 capped brood cells in a patch with white to purple-eyed nymphs were pierced with an entomological needle (0.45 mm in diameter) before replacing it in the colony. After a waiting time, the pinned brood cells with partially removed brood were summed together with those totally cleaned to calculate the clearance rate numerator, as, e.g., in Eynard et al. (2020). To enable a complete phenotyping of an apiary in a single visit, waiting time between pin-killing and brood removal measurement was shortened from 12 or 24 to 5 h at the end of winter and to 3.5 h in spring compared to usual recommendations (Büchler et al. 2013).

For VARROA, around 30 g samples of adult worker bees per colony and per visit were collected on open brood combs and then frozen, weighted, and washed with soapy water to detach mites from the workers (Dietemann et al. 2013). The phoretic V. destructor load was expressed per 100 bees by considering the average weight of a worker to be 0.14 g. A first infestation rate was estimated at the end of winter on all colonies to verify the initial absence of detectable mites.

In addition to these phenotypes, the presence of the desired queens (marked) was checked at each visit to ensure no requeening event had occurred. A last thorough checking was done in mid-August to verify the presence of all queens.

All traits were measured at the end of winter (except for BROOD and VARROA), spring, and early summer (except for HYG). All traits were measured each year except for HYG and RESERVES, which were not measured in 2020. Furthermore, no phenotypes were collected at the end of winter in 2020. Lastly, VARROA was not measured in spring in 2020.

2.3 Pedigree information

The pedigree included the known ancestors of BQs, the BQs themselves, and their offspring TQs. All BQs were inseminated. Each entry of the pedigree file corresponded to a queen. Each line contained the identification of the queen, the queen’s dam, and the queen’s mate (pseudo sire). Further columns characterized the mate of the queen (a pseudo sire) by the identification of the dam of the drone-producing queens (DPQs), the group of DPQs with their expected group size, and lastly, the expected number of drones (ND) it contributed to the mating of the queen.

For BQs, however, only the dam of DPQs used during instrumental insemination was recorded, but not which specific DPQs. To complete the pedigree and in accordance with the beekeepers’ insemination practice, we assumed that one group of three (NS = 3) sister-DPQs was used per insemination and that, for a particular dam of DPQs, this sister group was the same for inseminations performed in a year. According to Kistler et al. (2024), this assumption minimizes the probability of strongly deviating estimates of genetic parameters when the number of DPQ(s) used is uncertain and lies between one and three DPQs. The number of drones used for each artificial insemination was set to ND = 8.

For the open-mated TQs, we hypothesized that a dummy pseudo sire composed of a large (NS = 100) number of unrelated DPQs of unknown origin contributed ND = 12 drones for each open mating to model natural mating conditions (Baudry et al. 1998; Tarpy and Nielsen 2002; Schlüns et al. 2005; Tarpy et al. 2013). Setting the number of DPQs making up this dummy open mating pseudo sire to 100 meant that two TQs’ worker groups were considered paternally almost unrelated.

The pedigree included a total of 1087 colonies, including 1022 TQs, 42 dams (of which 20 are founders) used to produce TQ sister-groups (shared and private), and 23 maternal or paternal ancestors.

2.4 Statistical analysis

The pedigree was used to compute the inversed relationship matrix \(\mathbf {{A}^{-1}}\) following Brascamp and Bijma (2014, 2019).

The performance file contained one entry for each of the 1022 TQ colonies, with identification of the queen, the worker group, the traits’ records, the contemporary group, and the operator measuring each trait at each period. The contemporary group was defined as the interaction between the testing year, apiary, and the queen producer who bred the TQs. Pairwise Pearson’s correlation coefficients were calculated between all phenotypic values, considering all pair-wise complete observations.

To estimate genetic parameters, the vector of phenotypes y was described using a linear mixed model with only queen effects as genetic effects (QM) (Guichard et al. 2020b; Du et al. 2022):

$$\mathbf{y} = \mathbf{Xb} + \mathbf{Z}_\mathbf{{qa}_{q}}+ \mathbf{e}$$

where b is the vector of fixed effects (general mean, CG, and operator effect when present) with a corresponding incidence matrix \(\mathbf{X\,{a}_{q}}\), is the vector of genetic effects of the queens with incidence matrix \(\mathbf{{Z}_{{q}}}\), and e is the vector of residuals. A fixed operator effect was included for HYG, CALM, GENT, BROOD, and RESERVES, as preliminary analyses revealed it had a significant effect. It was not included for HONEY nor VARROA, as the same operator for each trait always measured a complete apiary at the same period. The performance file and the inversed relationship matrix were used to estimate genetic parameters and solve the BLUP equations (Henderson 1973) using AIREMLF90 (ver. 1.149) of the BLUPF90 package (Misztal et al. 2002).

All estimations were first run in single trait analyses. The heritability \(\mathbf{{h}_{{QM}}^{2}}\) was defined as the fraction of the estimated genetic variance in the QM over the phenotypic variance (sum of the genetic and residual variances of the QM). In addition to the single trait analyses, for all traits measured in early summer with a heritability estimate greater than its standard error, as well as HONEY (total seasonal yield) and HYG in spring, two-trait analyses were also run to obtain genetic correlation estimates. These correlations were compared with Pearson correlation coefficients between raw phenotypes or those adjusted for fixed effects as estimated in the BLUP model. The latter might be more informative when genetic correlation estimates are uncertain (Cheverud 1988; Roff 1995).

We performed separate single trait analyses for the same trait measured at multiple periods within the season. We did not average these measurements per colony, because genetically it may concern different traits. In particular, behavioral traits might be influenced by the sequence of measurements, with colonies possibly getting used to being visited. In addition, correction for operator effect becomes complex since operators differed for repeated records. Our data size did not allow for more sophisticated multivariate analyses.

We used R (v4.1.2; R Core Team 2021) with several packages for data formatting (tidyverse: Wickham et al. 2019; data.table: Barrett et al. 2023), computation of summary statistics (pastecs: Grosjean et al. 2018), and production of figures (corrplot: Wei and Simko 2021; rvg: Gohel 2024; officer: Gohel and Moog 2024).

3 Results

3.1 Raw phenotypes

Distributions of all traits in all periods are shown in Supplementary Figures S2 to S8.

Table II shows the descriptive statistics for all traits in all periods.

Table II Summary statistics of raw phenotypes and phenotypes adjusted for fixed effects

Raw coefficients of variation (CV) were highest (≫ 1.00) for both VARROA measures, followed by HYG in spring (0.95) and HONEY (0.81) measures. RESERVES and BROOD had intermediate coefficients of variation (~0.31), while GENT and CALM had the lowest ones (~0.17). Adjusting phenotypes for fixed effects reduced the CVs by, depending on the trait, 14% (for HYG at the end of winter) to 37% (for HONEY).

3.2 Genetic parameter estimates in single trait analyses

Estimated genetic parameters for all traits are shown in Table III. All estimates showed high standard errors due to the small size of the dataset. For GENT measured at any period, estimated h2 were zero. Estimated h2 were near zero also for BROOD in spring, and for CALM at the end of winter and in spring, and were far from statistically different from zero.

Table III Genetic parameters of all traits

Traits showing h2 estimates exceeding their standard error were HONEY, CALM in early summer, BROOD in early summer, both VARROA measures, both HYG measures, and all three measures of RESERVES. Heritability estimates for these traits ranged from low (around 0.15) for CALM in early summer, BROOD in early summer, and VARROA in spring, to moderate values (0.30 to 0.40) for HYG in Spring, HONEY, and VARROA in early summer. The trait RESERVES showed an intermediate h2, ranging from 0.19 ± 0.16 at the end of winter to 0.25 ± 0.18 in early summer, as did HYG at the end of winter (0.25 ± 0.19).

3.3 Correlations between traits

Figure 1 shows the phenotypic correlations adjusted for fixed effects for all traits. Using the adjusted phenotypes did not affect Pearson’s correlations strongly compared to using raw phenotypes (Supplementary Table S1). Most phenotypic correlations were close to zero, except for some measures of the same trait between periods, such as VARROA in spring and early summer at 0.4. Other exceptions were positive correlations around 0.5 between GENT and CALM measured at the same period; RESERVES in early summer with HONEY (around −0.2) and with BROOD in early summer (around −0.3); and lastly, BROOD in spring and early summer and HONEY (around 0.3).

Figure 1.
figure 1

Correlations between phenotypes adjusted for fixed effects. Traits: HONEY, total annual honey yield (kg); GENT, gentleness (1–4 rating); CALM, calmness (1–4 rating); BROOD, capped brood surface (× 1000 cell count); VARROA, phoretic V. destructor load (mites/100 worker bees); HYG, hygienic behavior (% of totally and partially cleared cells); RESERVES, honey reserves (1–4 rating). Periods: EndWinter, end of winter (March–April); Spring (late April–May); EarlySummer, early summer (end of June–July).

We estimated genetic correlations only for traits where the h2-estimate exceeded its SE. Resulting estimates showed very large SE and are in Supplementary Table S2. These genetic correlation estimates had the same sign as the corresponding phenotypic correlations that exceeded 0.15 in absolute value.

4 Discussion

We derived estimates of heritability for usual beekeeping traits and traits related to resilience, including a new trait linked to feed autonomy during the beekeeping season, in a starting breeding program. The data included only colonies of open-mated queens without queen daughters. Because of limited data, we estimated heritability values with a Queen Model. We also clarify how our estimates relate to genetic parameters of a Colony Model. Results showed that there was sufficient genetic variability on most classical traits and on traits related to resilience to enable meaningful genetic progress, including on a new trait measuring feed reserves.

4.1 Reduced models in datasets containing only records from open-mated potential DPQs

We chose to use a reduced model (with only a genetic queen effect) because our records were all from colonies with open-mated queens (TQs) that did not contribute diploid offspring to the breeding population. Primarily because of technical limitations in this initialization phase of the breeding program, the beekeepers were not yet able to performance test sufficient BQs, but only open-mated descendants. These offspring were candidates for selection as DPQs. Open mating meant that the worker group from a tested colony always had an unknown sire. Two worker groups were therefore related only if their dams (TQs) were related, and the relationship between the worker groups fully depended on the relatedness between their dams. Suppose two tested colonies had queens related by a coefficient \(r\), then their worker groups were related by \(\frac{1}{4}r\). This made the separate contribution of queens and worker groups to colony phenotypes statistically indistinguishable. Given our data structure, the genetic variance estimated in the Queen Model (\({\widehat{\sigma }}_{{\text{A}}^{(\text{QM})}}^{2}\)) captures a quarter of the genetic variance for the worker effect (\({\sigma }_{{\text{A}}^{\text{W}}}^{2}\)), the full genetic variance for the queen effect (\({\sigma }_{{\text{A}}^{\text{Q}}}^{2}\)), and the covariance between both effects (\({\sigma }_{{\text{A}}^{\text{QW}}}\)) as defined in the Colony Model (CM). Appendix B shows the relevant mathematics, including a proof derived independently by Manuel Du (personal communication; also see Du et al. 2022, Eq. 4a).

Our data structure differs, for example, from Andonov et al.’s case (2019), where records were also from colonies with open-mated queens, but the queens were used as BQs. In such a situation, results obtained with a Queen Model (QM) do not predict results obtained with a Worker Model (WM), as pairs of colonies with records do not always have the same proportionality factor between their queen’s relationship and their worker groups’ relationship. In such cases, queen effects can potentially be distinguished from worker effects, thereby also potentially enabling the use of a Colony Model (CM).

In our specific case, QM and WM estimates could be predicted from one another because they captured the same covariance structures. However, there is no general relationship between estimates of the reduced models and the CM, nor between the two reduced models, because their estimates depend on the data structure.

4.2 Heritability estimates and comparison with the literature

HONEY and HYG showed heritability estimates in the range of what has been reported in genetic analyses using ReML parameter estimates (Brascamp et al. 2016, 2018; Andonov et al. 2019; Guichard et al. 2020b; Hoppe et al. 2020). A detailed comparison is not useful given our large SE.

Although GENT and CALM still showed phenotypic variability after correction for fixed effects, their heritability estimates were virtually zero (except CALM in early summer). This is similar to values reported for Swiss carnica (Guichard et al. 2020b). Both traits appeared heritable, however, in Swiss mellifera (Guichard et al. 2020b), and German (Hoppe et al. 2020) and Austrian (Brascamp et al. 2018) carnica.

The trait VARROA was derived from the mite count on a bee sample and, therefore, from a meristic trait with a small mean, especially in spring, where 63% of the colonies had zero mites. It had a low h2 in spring (0.13 ± 0.10). Later in the season, the proportion of colonies with zero mites decreased to 26%, increasing the mean of the trait. The h2 in early summer also appeared higher (0.33 ± 0.16). However, with the four-fold increase in the genetic standard deviation being accompanied by a proportional increase in the mean, the genetic CVs of the trait were about the same (around 0.7) in both periods.

VARROA is generally reported of low heritability, although most estimates rely on extremely small sample sizes (reviewed by Guichard et al. 2020a). Although the SE of our heritability estimates were large, VARROA appeared suited for selection with the largest genetic coefficient of variability, giving good prospects for evolvability (Houle 1992) of this trait.

We calculated the HYG score by including partially removed brood, as very few colonies had some completely removed cells within a few hours following brood damage. Furthermore, preliminary results (on 132 colonies) showed that, while early uncapped brood cells could be recapped later by workers, partially removed brood was practically always followed by complete removal after 24 h. By waiting only 3.5 h in spring, compared to 5 h in early winter, the average HYG score was further below the ideal 50% value compared with what we obtained at the end of winter, and a fifth of colonies had less than 5% cells with partially or totally removed brood. Even so, both h2 and the genetic CV for HYG seemed higher in spring than at the end of winter, although estimates were not statistically different.

The new trait RESERVES had estimates of heritability and evolvability that suggest a meaningful variability of this trait whenever it was measured from the end of winter to early summer. Comparison with more labor-intensive measurements, using the brood chamber’s weight or the visual assessment of the total feed reserves’ surface (Hernandez et al. 2020) showed that the resulting Spearman’s correlation coefficients were very high (unpubl. data). As the testing protocol (Appendix A) is relatively simple to implement, this trait could be widely used by beekeepers for their selection. Feed autonomy during the season results from the colony foraging more feed than it consumes. The capacity of foraging thus makes it another trait than feed autonomy during winter. Whether a genetic increase of the feed reserves in the brood chamber is desirable will depend heavily on managing practices (total size of the brood chamber, sedentary or moving apiaries…) and floral abundance and variability through time. Too many feed reserves in the brood chamber would eventually limit the space available for brood and result in a reduced foraging capacity. Beekeepers would probably aim at an optimum for the RESERVES trait, reached when the feed reserves stored in the brood chamber are sufficient to enable a good colony development and never necessitate emergency feeding.

Feed autonomy is expected to be favorably linked with survival. Survival in given environmental conditions is an ultimate resilience trait (i.e., fitness assessment) in animal breeding because it measures an individual’s resistance to multiple mortality factors in a specific environment. We did not study colony survival, however, because our limited dataset size could not allow for a meaningful genetic analysis of such a fitness trait that is well-known to be weakly heritable both in farm and wild animal populations due to unidentified environmental effects leading to high residual variation (Houle 1992). We may note that even from the large data set in the BeeBreed program (Bienefeld and Hoppe 2024), no estimates of genetic parameters for survival were published.

4.3 Correlations between traits

We found a phenotypic correlation of around 0.5 between GENT and CALM measured at the same period, similar to results of Hoppe et al. (2020) in German carnica. We also found a moderate positive phenotypic correlation between BROOD and HONEY, as reported in the literature (e.g., Kretzschmar and Maisonnasse 2022).

Furthermore, we found a moderate negative phenotypic correlation between BROOD and RESERVES, also reflected in the negative genetic correlation estimate, albeit highly uncertain (−0.8 ±1.5). A hypothesis is that BROOD and RESERVES reflect the proportion of a frame either filled with honey or brood. Consequentially, there is an antagonism between the two traits, as frames completely filled with honey cannot be filled with capped brood and vice-versa.

In addition, for the moderately negative correlation (phenotypic and genetic) between RESERVES in early summer and HONEY, two hypotheses can be made. First, both traits might be negatively correlated indirectly because of the negative correlation between RESERVES and BROOD and the positive one between BROOD and HONEY. Second, when colonies produce a fixed amount of honey and store more honey in the brood chamber, then they must store less of it in the super, lowering the honey yield. A second hypothesis, however, is that variation in total honey production could create a positive correlation between HONEY and RESERVES. This positive correlation would be due to colonies having more honey in both the super and the brood chamber when producing more in total, assuming that fixed proportions are allocated to the brood chamber and the super (van Noordwijk and de Jong 1986). Furthermore, colonies that store more honey in the brood chamber could have an advantage by suffering less from honey flow shortages, in turn benefitting honey production during strong honey flows. Such events were observed by the beekeepers and motivated them to integrate the RESERVES trait to their breeding objective. However, this advantage during honey flow shortages might have been partially suppressed by the beekeeper’s managing techniques, who nourish apiaries in emergency when needed, thus partially masking the expected effect of colonies with good RESERVES values on colony development and therefore on yield.

Lastly, HYG and VARROA had weak negative phenotypic (around −0.1) and genetic correlations, such that, to a small extent, selection for HYG may have an adverse effect on V. destructor load as also previously reported (Hoppe et al. 2020).

5 Conclusion

We reported results of a genetic analysis of a small honeybee population from an emerging breeding plan targeting the improvement of classical traits and traits related to resilience. In this initial phase, records were only taken on colonies of open-mated queens produced by controlled mated dams. The open-mated queens were only used as potential drone-producing queens. This created a structure in the data in which the queen effect on colony performance cannot be separated from the worker effect. For this reason, we used the Queen Model. Heritability estimates for production traits and traits related to resilience suggest that selection will be effective, although all our estimates had large standard errors due to the small dataset size. A genetic antagonism between the trait meTarpyasuring feed reserves and honey yield, however, might limit simultaneous improvement on both traits.