The European dark honey bee, Apis mellifera mellifera, is threatened in most of its native range, in part, due to introgressive hybridization with bees from the highly divergent C-lineage, mainly Apis mellifera carnica and Apis mellifera ligustica (De la Rúa et al. 2009; Pinto et al. 2014). Yet, the maintenance of locally adapted genetic diversity is critical for the population long-term survival and sustainability (De la Rúa et al. 2009; Meixner 2010). The growing awareness that genetic diversity is important for sustainable beekeeping led to implementation of different conservation and breeding programs throughout Europe, which are in need of reliable and cost-efficient molecular tools to accurately monitor C-lineage introgression into A. m. mellifera (De la Rúa et al. 2009; Henriques et al. 2018a, b; Meixner 2010). The large mating flight distances and the polyandrous mating system make it challenging to preserve honey bee subspecies in an open conservation area where intruders can fly in (Neumann et al. 1999). It is therefore necessary to regularly control the genetic ancestry of new or superseded colonies.

A number of studies have designed diagnostic tools with nuclear (nc) SNP markers capable of discriminating different honey bee lineages (Chapman et al. 2015, 2017; Muñoz et al. 2015; Parejo et al. 2016). Recently, a set of highly informative ncSNPs for estimating C-derived introgression into A. m. mellifera (Muñoz et al. 2015) were combined into four assays, which were tested and validated for genotyping in the Agena BioScience™ iPLEX MassARRAY platform (Henriques et al. 2018a). The number of ncSNPs included in each of the four assays (M1 = 34, M2 = 32, M3 = 28, M4 = 23) is lower than the maximum capacity of the iPLEX protocol (40 SNPs) allowing additional markers, such as mitochondrial SNPs (mtSNPs). By combining SNPs of both mitochondrial and nuclear compartments, a more complete identification is achieved in a single genotyping step.

Analyzing mtDNA, in addition to ncDNA, enables better-informed decision-making in A. m. mellifera conservation and breeding programs. Due to independent transmission of nuclear and mitochondrial DNA, a colony carrying C-derived mtDNA can be identified as pure A. m. mellifera at the nuclear level, after several generations of backcrossing with A. m. mellifera drones (Jensen et al. 2005; Meixner et al. 2013). Then, whether a colony exhibiting cytonuclear incongruence is maintained in a conservation area is a decision to be made by the conservation manager.

The aim of this study was to improve the SNPs assays developed by Henriques et al. (2018a) adding informative mitochondrial SNPs. To this end, we used whole mtDNA sequence data of 155 drones, each representing a single colony, collected across Europe (Table S1) (Henriques et al. 2019; Parejo et al. 2016). Mapping and variant calling were performed following best practices detailed in Henriques et al. (2018b) (see Table S1 for further details). A total of 397 SNPs were identified in the 16,343 bp reference mitochondrial genome (Crozier and Crozier 1993). These were distributed across the 13 protein-coding genes with ATP8 containing the lowest (5) and COX1 the highest number (46) of SNPs (Table S2). Maternal ancestry of the 155 individuals was assessed by a neighbor-joining (NJ) tree constructed in MEGA7 (Tamura et al. 2013). A total of 104 individuals were placed in the M-lineage cluster and 51 in the C-lineage cluster (Table S1; Fig. S1).

To select informative mtDNA markers, we randomly chose 78 M- and 38 C-lineage individuals as training set and 26 M- and 13 C-lineage as holdout set (following Anderson (2010)). FST values between M- and C-derived individuals were inferred for each mtSNP from the training set using PLINK 1.9 (Chang et al. 2015), and those showing FST values > 0.75 were selected. Sequence information of the flanking regions (250 bp) of the selected mtSNPs was then used to supplement the four ncSNP assays (Henriques et al. 2018a) using the Replex option of the software AssayDesign 4.0 (Agena BioScience™). The software searched for optimal areas within the flanking regions to design forward, reverse, and iPLEX extension primers compatible with those published in Henriques et al. (2018a).

From a total of 397 mtSNPs, 193 were highly informative (FST > 0.75, Table S2). Of these, the assay design tool was able to incorporate five into the M4 ncSNP assay. The five mtSNPs are located in COX1, ND4, and ls-rRNA (Table I). To evaluate their power for distinguishing M- and C-lineage colonies, a NJ tree was constructed using the holdout data set. The NJ tree shows that the five mtSNPs are sufficient to discriminate colonies of C- from M-lineage maternal ancestry (Fig. S2).

Table I Genomic information of the five selected SNPs to distinguish C and M maternal lineages

The five informative mtSNPs could only be added to the M4 ncSNP assay designed previously (Henriques et al. 2018a), which contained the lowest number of ncSNPs (23). Henriques et al. (2018a) showed that assay M1 (34 ncSNPs) has the best individual performance in estimating nuclear C-lineage introgression. Accordingly, a good approach for concurrent inference of nuclear and mitochondrial ancestry could be to combine M1 with the new 28-plex M4 (23 ncSNPs plus 5 mtSNPs). We therefore tested if the introgression proportions (Q values) estimated from the ncSNP assays M1 + M4 were similar to those obtained from the four assays together (Henriques et al. 2018a). From the 117 SNPs included in the four assays, we were able to extract 113 from the 155 whole-genome (WG) sequences. We run ADMIXTURE (Alexander et al. 2009) in the 155 individuals using the four assays combined (113 ncSNPs) and M1 + M4 (55 ncSNPs, as 2 of them were not in the WG sequences). Q values were estimated for K = 2 using 10,000 iterations in 20 independent runs. Q values estimated with M1 + M4 were highly correlated with those estimated with the 4 assays (r = 0.995, P value < 0.001), and the differences between them were negligible (mean absolute difference = 0.027; Table S3).

There is a need of accurate identification of C-lineage introgressed colonies for an efficient management of A. m. mellifera conservation programs. A more complete identification requires informative markers of both nuclear and mitochondrial compartments (Meixner et al. 2013). For example, in this study, cytonuclear incongruence is observed in two A. m. ligustica (C-lineage) individuals that carry mitochondria of M-lineage ancestry (Table S3). Here, we improved the recently designed M4 ncSNP assay (Henriques et al. 2018a) by adding five mtSNPs and provided PCR and iPLEX primers for genotyping in the MassARRAY platform (Table S4). We showed that C-lineage introgression into A. m. mellifera can be accurately estimated by combining the new cytonuclear M4 assay improved here with the best performing M1 nuclear assay developed previously (Henriques et al. 2018a). The combined M1 + M4 assays, containing 57 nuclear and five mitochondrial SNPs, provide a robust molecular tool for assisting management decisions toward protection and reestablishment of the endangered A. m. mellifera.