1 Introduction

Honey bees (Apis mellifera) are important pollinators of many crops (Klein et al. 2007; Gallai et al. 2009; Potts et al. 2010; Breeze et al. 2011; Hung et al. 2018). While the total number of honey bee colonies has increased worldwide (Phiri et al. 2022), pressures from pesticides, pests, diseases, heavy metal pollution, extreme weather, and lack of forage (Smith et al. 2013; Goulson et al. 2015; Steinhauer et al. 2018; Insolia et al. 2023; Yang et al. 2023) have led to widespread annual colony losses (Gray et al. 2022). Increases in land devoted to pollination-dependent crops mean that honey bee populations are unlikely to meet demand for their services (Aizen and Harder 2009; Aizen et al. 2019; Mashilingi et al. 2022). To mitigate these factors, many countries have implemented modern animal breeding programmes (Bienefeld et al. 2008; Büchler et al. 2013; Zakour and Bienefeld 2014; Tahmasbi et al. 2015; Bixby et al. 2017; Costa-Maia et al. 2018; Guichard et al. 2020; Hoppe et al. 2020; Petersen et al. 2020; Maucourt et al. 2021).

To support honey bee breeding programmes, evaluation of genetic diversity at the complementary sex determiner (csd) locus is crucial (Page and Laidlaw 1985). Honey bees have a haplodiploid sex determination system. Sex is determined by zygosity at csd (Beye et al. 2003). Bees that are heterozygous at the csd locus become females, hemizygous (unfertilised) individuals become males (drones), and those that are homozygous become abnormal diploid drones that are killed during development (Mackensen 1951; Woyke 1963, 1980). If a queen mates with a single male who carries the same csd allele as herself, 50% of her diploid offspring will be killed (Page 1980).

Colonies producing high rates of homozygous diploid males—as would occur if a queen has a genotype comprised of common alleles—often fail to thrive or be productive due to poor population growth (Page 1980; Woyke 1980), and are often inviable (Woyke 1963; Kaskinova et al. 2019). High allelic diversity at csd is therefore important for avoiding colony loss. Past studies have shown that increased csd diversity will result in higher expected brood viability in a population (Page and Laidlaw 1985).

Honey bee queens are polyandrous, typically mating with more than ten drones (Neumann et al. 1999; Tarpy et al. 2004, 2010, 2015; Simone-Finstrom and Tarpy 2018; Chapman et al. 2019a), at congregation areas that can attract drones from a distance of over 3.5 km (Utaipanon et al. 2019). As monandry is extremely rare (Neumann et al. 1999; Tarpy et al. 2004, 2010, 2015; Simone-Finstrom and Tarpy 2018; Chapman et al. 2019a) and the number of csd alleles is high (Bilodeau and Elsik 2021), csd homozygosity in diploid progeny is rare in nature (Page 1980). In closed breeding systems, there is potential for loss of csd alleles, as queens are produced from a small number of colonies. Over multiple generations, and without external genetic supplementation, there is a chance of decreased diversity of csd alleles in the population (Page and Laidlaw 1985). Another factor affecting csd diversity is the rapid collapse of colony numbers, for instance a pest or disease outbreak resulting in widespread colony death, which can put honey bees at greater risk of inbreeding (Büchler et al. 2010; Hristov et al. 2020).

To assist with monitoring breeding programmes, investigation of the frequency and diversity of csd alleles can inform which colonies to produce queens and drones from (Hyink et al. 2013). Data on population-wide csd diversity can be used as a comparison point to measure changes over time, alerting breeders to decreased genetic diversity. Investigation of csd alleles has been performed in some overseas breeding programmes (Table I). All programmes found high csd diversity (Hyink et al. 2013; Zareba et al. 2017; Kaskinova et al. 2019; Bilodeau et al. 2020; Paolillo et al. 2022).

Table I Number of complementary sex determiner (csd) alleles found in the literature. Proportion is the number of alleles divided by the potential number of alleles found (number of colonies times two)

A national honey bee genetic improvement programme (Plan Bee) was launched in Australia in 2020 (Chapman and Frost 2021a; Chapman et al. 2022). Plan Bee provides a platform to implement modern animal breeding techniques in the Australian beekeeping industry. It consists of a research population managed by the New South Wales (NSW) Department of Primary Industries and a network of bee breeders that submit data. Plan Bee aims to drive improvement in traits important to the beekeeping and honey bee pollination-dependent industries, such as disease resistance, honey production, colony strength, and good temperament (Frost et al. 2021, 2022; Chapman and Frost 2021a). Successful implementation will allow commercial beekeeping to remain profitable and maintain strong healthy bee populations (Banks et al. 2021; Chapman and Frost 2021b, 2022), ensuring pollination services for industries that depend upon honey bees. In Australia, more than 35 industries rely on paid honey bee pollination services, estimated to be worth AU$14.2 billion (Clarke and Le Feuvre 2021).

In this study, we investigate csd diversity at the hypervariable region (HVR) for the first time in four Australian breeding populations. The HVR is within the potential specifying domain (PSD) located on exons 6–8 of the csd gene (Zareba et al. 2017) and is the driving force of high allele variation seen in csd (Beye et al. 2013; Lechner et al. 2014; Zareba et al. 2017). The aim of the study was to determine if csd diversity was similar to that elsewhere in the world. If it were low, this would indicate that further sampling was required. If further sampling again showed low diversity, this would indicate that Australian populations could require careful management to both maintain diversity and avoid lethal csd allele combinations.

The first population was from a bee breeder in NSW who has been operating for approximately 8 years with four breeding lines sourced from commercial beekeepers and breeders around Australia with ~ 350 colonies. Mating is controlled mostly through drone flooding (providing many drones from the selected population such that queens mate predominantly with selected stock rather than drones from other colonies in the area) at isolated mating stations (> 4 km from other beekeepers) but may not be completely isolated due to the potential presence of feral (unmanaged) bees. There is also some use of artificial insemination.

The second population is from Better Bees WA Inc. The programme was initiated in 1979–1980 to provide Western Australia (WA) with quality stock following the closure of the borders to prevent European foulbrood from entering the state (Kühnert et al. 1989). For the first few years, an isolated mating station on Rottnest Island was used. From 1983, the programme was maintained by artificial insemination with line breeding, queens being inseminated with homogenised semen collected from drones of all lines (Kühnert et al. 1989). When it was taken over by beekeepers in 1991, isolated mating was again utilised (Chapman et al. 2008). Annually, the beekeepers (11 at the time of this study) in the consortium take 50–70 of the top performing colonies to provide drones for hundreds of queens to mate with on Rottnest Island, where there are no feral bees (Chapman et al. 2008). These breeder queens are then returned to the mainland and used to produce daughter queens that are typically open-mated and distributed predominantly within WA. The number of lines in the programme has varied over the years, starting with 20 lines (Kühnert et al. 1989), having 24 in 2005 (Chapman et al. 2008), and 31 at the time of this study.

The third population is a research and development stock run by the New South Wales Department of Primary Industries. The population was established in March 2020 with stock from Western Australia, Queensland, and NSW bee breeders, with new genetics introduced in 2021 from South Australia and Tasmania. This population consists of 250 colonies made of two subpopulations—black and yellow. Stock is maintained using a mix of open mating and artificial insemination.

The last population is from a bee breeder located in Queensland who has been commercially breeding bees since 2013 with ~ 450 colonies. There are eight breeding lines, originating from several sources including queen bee breeders from NSW and Kangaroo Island, South Australia. Breeding stock is maintained by AI and open mating with drone flooding.

2 Material and methods

2.1 Sample collection and DNA extraction

Up to six drone pupae (range 1–6) were collected per colony. As drones are unfertilised, sampling six provides a 96% probability of observing both queen alleles (Hyink et al. 2013). Samples were collected from 37 colonies from the NSW beekeeper in April 2021, 55 drone-producing colonies from Better Bees WA in October 2020, 97 colonies from the NSW Department of Primary Industries in 2021–2022, and 20 colonies from the Queensland beekeeper in 2018. Samples were preserved in 100% ethanol or dry ice and stored at − 20 °C.

DNA was extracted from the hind legs of the pupae. Twenty milligrams of tissue was homogenised in 400 µl of 5% Chelex (Bio-Rad, United States) using a TissueLyser II (Qiagen, Germany). The tissue was then boiled for 10 min in a water bath (Oldroyd et al. 1997) and centrifuged at 1883 RCF for 20 min. One hundred microliters of supernatant was then stored at 4 °C and used as a DNA template for PCR.

2.2 csd sequencing

The HVR of csd was amplified using the primers M13Fconscsdrev and AD1genoRfw (Hyink et al. 2013). PCR was performed using 0.15 µl KAPA polymerase (Merck, Germany), 3 µl template DNA, 0.40 µM primer, and 1.5 mM MgCl2 in 20 µl reactions on a Veriti™ 96-Well Fast Thermal Cycler (Applied Biosystems, USA). The thermocycling protocol was 95 °C for 5 min, 35 cycles of 95 °C for 30 s, 49 °C for 15 s, 72 °C for 30 s, and a final extension of 72 °C for 10 min.

To distinguish between each of the queen’s alleles, restriction digests were used, which resulted in restriction fragment length polymorphisms (RFLP), where bands of different sizes indicate different alleles to sequence. Five microliters of drone PCR product was digested with Apol (New England BioLabs, USA). The restriction digest was incubated for 30 min at 49.5 °C and then run on a 1.5% agarose gel in Sodium Borate buffer, containing SYBR Safe DNA gel stain (Invitrogen, USA) and a 1 Kb Plus DNA Ladder (Invitrogen, USA) at 160 V for 25–30 min. The digests were visualised on the Molecular ImagerR Gel Doc™ XR System (Bio-Rad, USA) on Image Lab software v6.1 (Bio-Rad). PCR products representing each of the queen’s csd alleles were sent for sequencing in the forward direction at Macrogen (South Korea) using sequencing primer AD1 (Hyink et al. 2013). In some cases, only one RFLP profile was found from a colony, which may be due to only drones carrying the same queen allele being sampled by chance (expected 4% of the time where six drones were sampled per colony) or due to alleles not differing at ApoI sites. In some cases, more than two RFLP profiles were found from a colony. This could be due to the colony being queenless, resulting in the workers activating their ovaries and producing unfertilised eggs, rare workers activating their ovaries and producing unfertilised eggs in the presence of the queen (Ratnieks 1993), or mutation. Where more than two alleles were found from a colony, the sequences were aligned and assessed for mutation or recombination.

2.3 Data analysis

DNA sequences were manually checked for quality; 47 sequences were excluded as less than 25% of the contig was high quality under Geneious Prime’s (version 2021.2, Biomatters Ltd., New Zealand) calculation of Phred quality score. The sequences were trimmed to the six amino acid flanking sequence upstream of the HVR and the highly conserved ‘IEQI’ sequence downstream following Bilodeau and Elsik (2021) and aligned using Clustal Omega alignment in Geneious Prime (Dotmatics).

The nucleotide sequence was translated to the amino acid sequence in Geneious Prime. Sequences with ≥ 1 amino acid variants were considered different alleles. The alleles were matched to the Hymenoptera Genome Database (Bilodeau and Elsik 2021). Alleles that did not have a 100% match to the database record were given new identification labels. Newly identified alleles (16) were aligned pairwise via a Needleman-Wunsch global alignment in Geneious Prime to obtain 120 alignments of potential combinations of new csd alleles and the number of amino acid differences between them was calculated.

3 Results

For the NSW beekeeper, 15 out of 37 colonies had fewer than six pupae collected (mean = 4.9, range = 1–6), due to poor drone availability in April. Better Bees WA had six drone pupae collected from each colony; in the NSW Department of Primary Industries samples, 96 out of 97 colonies had six drones. The Queensland beekeeper had only one colony out of 20 with six drones collected (mean = 4, range = 1–6).

3.1 Amino acids

Eighty-two unique csd amino acid sequences were identified (Supplementary Table 1). Sixteen had not been previously reported and the other 66 matched previously identified alleles in the Hymenoptera Genome Database (Bilodeau and Elsik 2021). The new alleles are deposited in GenBank under the accession numbers ON507758–ON507774.

Only a single amino acid sequence was characterised in nine colonies for the NSW beekeeper, six colonies for Better Bees WA, ten colonies for the NSW Department of Primary Industries, and five colonies for the Queensland beekeeper, where two were expected due to queens carrying two alleles. Eight colonies had three amino acid sequences characterised, while three had four amino acid sequences characterised. Sequence alignments indicated that the origin of these alleles was from workers laying eggs rather than mutations or recombination.

The NSW beekeeper had 34 unique alleles characterised from 37 colonies, Better Bees WA had 30 unique alleles from 55 colonies, the NSW Department of Primary Industries had 58 from 97 colonies, and the Queensland beekeeper had 19 alleles from 20 colonies.

No allele was found in all four populations (Figure 1). Better Bees WA and the Queensland beekeeper had the fewest alleles in common (3), while the NSW beekeeper and NSW Department of Primary Industries shared the most (24). Seven sequences occurred in both of the NSW Department of Primary Industry breeding populations. The black subpopulation had 36 unique alleles found from 33 colonies and the yellow subpopulation had 51 alleles found from 57 colonies.

Figure 1.
figure 1

Venn diagram representing the overlap of alleles found in the four populations in this study: Better Bees WA Inc, individual bee breeders from New South Wales and Queensland, and the research population managed by the New South Wales Department of Primary Industries

The newly characterised alleles varied in length from 29 to 53 amino acids. Pairwise alignments revealed an average difference of 14.46 ± 0.47 amino acids between pairs and a range of 1–29 (Figure 2; Supplementary Table 2).

Figure 2.
figure 2

Number of differences observed between pairs of 16 newly identified amino acid sequences for the hypervariable region (HVR) of the complementary sex determiner (csd) gene in honey bees (Apis mellifera) when aligned pairwise (120 combinations in total)

4 Discussion/Conclusion

The results from this study demonstrate that there is sufficient diversity at csd in Australia, using other countries as comparison, boding well for the Plan Bee programme. Generally, allele diversity at csd is reported as the number of csd alleles found in the total colonies sampled. Here, we have reported the number of alleles found as a proportion of the possible number found assuming that only the queen produced drones (i.e. the number of colonies multiplied by two, the number of csd alleles the queen carries). The proportions found in this study for all four populations (27.3–47.5%) are within the range reported worldwide, (19.0–66.5%; Table I). While this method enables comparison of diversity among populations, it is problematic. Firstly, workers are known to produce drones in the presence of the queen (Ratnieks 1993; this study) and this calculation cannot take that into consideration. Secondly, it does not take into consideration the probability of sampling both queen alleles due to differences in sample size per colony. Finally, it does not provide information about whether csd proportions are diverse enough to sustain long-term growth. Nevertheless, it does provide a means of comparing populations. The results of this study contain both underestimations due to missing queen alleles and overestimations due to worker reproduction; this however does not affect the main conclusion that there is sufficient csd diversity (82 unique alleles) in Australia. New Zealand’s breeding population has the lowest recorded csd diversity at 19.0% and 16 unique alleles (Table I). As the New Zealand programme is sustaining normal breeding, it is used as a baseline for minimum viable csd population diversity (Hyink et al. 2013; Zareba et al. 2017; Kaskinova et al. 2019). Next-generation sequencing on honey samples has revealed 160 alleles identified from 12 colonies (Bovo et al. 2021). Thus, Bovo et al. (2021) show that high csd diversity in populations makes it unlikely for diploid males to occur at high frequency.

The csd diversity in Western Australia could be expected to be low as the state banned the import of bees since the 1970s due to the risk of introducing exotic honey bee diseases from the eastern states, and rates of feral bee introgression are low (Chapman et al. 2008, 2016). Better Bees WA has not observed any signs of inbreeding and has been able to maintain csd allele diversity with 30 alleles from 55 colonies. This observation is consistent with Better Bees WA’s high genetic diversity at microsatellites (Chapman et al. 2008). Better Bees WA uses a high number of drone-producing colonies each year from different sources for mating (Chapman et al. 2008; Oxley and Oldroyd 2010). In 2020, 55 drone-producing colonies were taken to Rottnest Island. Coupled with the rapid mutation of the csd gene which is also under balancing selection (Ding et al. 2021), it has allowed novel csd variants to not only emerge, but also persist in the population. The ability of csd variants to persist in populations was observed in A. cerana in Australia, where a single introduced colony was able to successfully establish an invasive population with the aid of strong balancing selection that equalised allele frequencies, keeping initially rare csd alleles in the population (Gloag et al. 2019; Ding et al. 2021). It is likely that the combination of management, mutation, and selection has helped maintained csd diversity.

Computer simulations have attempted to establish csd loss in closed breeding populations; an early study of 50 colonies and 10 csd alleles found a 95% chance of mean brood viability being at least 85% after 40 generations (years) (Page and Laidlaw 1982). This study found 30 alleles in Better Bees WA, three times that in the simulation study, and only a small proportion of the population was sampled. There were 48,978 registered hives in WA in 2019 (Clarke and Le Feuvre 2021). It therefore seems likely the programme can continue for another 40 years.

Both individual queen bee breeders in NSW and Queensland had a greater proportion of unique alleles compared to the number of alleles in the consortium of 11 beekeepers in Better Bees WA. Both individual queen bee breeders source their queens from a variety of stock from all over Australia and have been operating for less than 10 years.

NSW Department of Primary Industries’ black and yellow subpopulations both display csd diversity comparable to other breeding populations, with black having 54.5% of the potential number of alleles given the number of colonies investigated and yellow having 44.8%, a higher individual average compared to the total NSW Department of Primary Industries population comparison (Table I).

This study revealed alleles at high frequency in three populations. The NSW beekeeper had one allele present in 17.4% of samples; another allele was present in 13.6% of Better Bees WA samples. The Queensland beekeeper had three alleles at frequencies above 10%. The overuse of the offspring of these queens would result in further overrepresentation of these alleles. In naturally reproducing populations, csd alleles will be maintained via balancing selection (Cho et al. 2006; Hasselmann and Beye 2006; Lechner et al. 2014; Gloag et al. 2019; Ding et al. 2021). Breeding populations take the queens and drones from the colonies that perform the best commercially, and propagate their genetics in the next generation, potentially leading to narrowing in the number of csd alleles or some alleles appearing at high frequency. Taking csd alleles and inbreeding in general into consideration is important for the maintenance of breeding populations long term. In particular, breeding programmes that use artificial insemination should consider testing csd alleles as this technique is both time-consuming and requires expert skills (Khan et al. 2022). As it is costly to produce queens using artificial insemination, issues with low csd diversity would be detrimental and could result in loss of accumulated genetic gain. Queens could thus be chosen as drone or queen mothers and crossed such that diversity at csd is maximised. Checking csd alleles of queens used in artificial insemination is currently applied in the closed breeding programme in New Zealand (Hyink et al 2013). The methods used in this study should also be used to estimate csd diversity in other Australian breeding populations of importance. For instance, Kangaroo Island lost over a quarter of its hives in the 2019–2020 Australian bushfires (Clarke 2020) and has had no import of bees since 1885 (Chapman et al. 2019b).

This study identified 16 new csd alleles, and without doubt, if more samples had been taken, more alleles would have been found as each study identifies new alleles. The new alleles identified in this study show a similar range in length and distribution in variation between pairwise alignment as in other studies (Zareba et al. 2017; Bovo et al. 2021). csd alleles are traditionally considered diverse enough to make a functional pairing if a pair has a minimum of five amino acid differences (Beye et al. 2013; Zareba et al. 2017). However, this rule has shortcomings as in A. cerana, a single amino acid difference in the HVR was sufficient to result in a functional pair (Ding et al. 2021). Whether a pair will be functional is more complex than the number of amino acid differences in the HVR, with the PSD also playing a role (Ding et al. 2021). A larger number of variations in the HVR may be indicative of an increased probability of a functional pairing, but it will not always be the case; the only way to be sure that a pair will result in viable offspring is to combine them in the field.

Varroa destructor was detected in Newcastle, NSW, in June 2022 (Department of Regional NSW 2023) and efforts to eradicate it ceased in September 2023. Most queen breeding outside Western Australia occurs in southeast Queensland, and central and northern NSW, due to favourable climate (Clarke and Le Feuvre 2021). The V. destructor incursion may lead to an increase in local breeding in which beekeepers source bees locally due to border closures and movement restrictions to prevent spread. Local bee populations have shown local adaptations, making them less prone to environmental impacts (Meixner et al. 2014; Walsh and Rangel 2016). If this were to occur, genetic diversity in these populations should be evaluated. Moreover, it is expected that untreated honey bee colonies will die within 1–2 years (Le Conte et al. 2007), and some proportion of treated colonies may also die. This may be further exacerbated by Australia’s honey bee stock having shown a lack of resistance against V. destructor when tested in the USA (Oldroyd 2012; Rinderer et al. 2013). Such a loss is likely to significantly reduce genetic diversity in general. More detailed knowledge of csd diversity across Australia may help to develop management plans to mitigate such losses. It will take some time for V. destructor to spread across such a large country; sampling of commercial and feral populations ahead of the invasion front and over time will allow not only the tracking of loss of genetic diversity, but also the identification of selective sweeps associated with strong selection on genes associated with resistance to V. destructor (Chapman et al. 2023).