1 Introduction

An understanding of the origin and processes of biodiversity and distribution patterns, such as the knowledge on the geographical origin, refugia, and diversification of organisms, can provide a theoretical foundation for biodiversity conservation (Dawson et al. 2011). For instance, the climatic fluctuations dominated by glacial-interglacial cycles since the Pleistocene are one of the important factors that determine the distribution patterns of organisms (Avise 2000). Among them, the last glacial period had the most profound impact. During ice ages, the glacial refugia were habitats that organisms retreat to, and they were also the starting point for species re-dispersion after the ice age. The study on glacial refugia is of great significance in understanding the speciation mechanism and biodiversity conservation (Selwood and Zimmer 2020).

Honey bees (Apis spp.) are essential pollinators for both natural and agricultural ecosystems (Hung et al. 2018). However, the honey bees are facing an unprecedented population and biodiversity decline, causing direct damage to pollination services, in turn exacerbating human food deficit (Potts et al. 2010; Potts et al. 2016; Requier et al. 2019). Climate change and human economic expansion accelerate habitat changes and are considered to be important causes of this decline (Potts et al. 2010). The outsized importance of the honey bees and the potentially devastating impacts of its decline call for a better understanding of the following three questions: (1) How did the diversity and distribution pattern of honey bees develop? (2) Where is the center of origin of honey bees? (3) How does climate change affect their distribution? The answer to the latter two questions is the basis for solving the first question.

Nine existing species have been identified under the Apis genus (Han et al. 2012; Koeniger et al. 2011; Lo et al. 2010; Oldroyd and Wongsiri 2009; Raffiudin and Crozier 2007). The native range of the Western honey bee (A. mellifera Linnaeus, 1758) encompasses Europe, Africa, Middle East, Central Asia, and western China, whereas the other eight Apis species are found mainly in Asia (Arias and Sheppard 2005; Chen et al. 2016; Sheppard and Meixner 2003). Interestingly, the distribution regions of all the eight species in Asia overlap in South and Southeast Asia, whereas only the distribution of the Eastern honey bee (A. cerana Fabricius, 1793) extends northward to the temperate zone (Hepburn and Radloff 2011).

There are several hypotheses on the center of origin of the entire Apis genus. Among them suggests a center of origin in the South and Southeast Asia as all the extant species except A. mellifera are native to that region (Dietz 1982). Another more specific hypothesis anchored on the theories of the angiosperms’ origin and continental drift claims that the honey bees originated from South Asia (including India and its adjacent areas) (Culliney 1983). There are also other hypotheses based on fossil records. For instance, a Miocene fossil of A. miocenica Hong, 1983 was discovered in Shandong, China. Thus, it was inferred that honey bees originated from the ancient land of North China (Hong 1983). Another Apis fossil record study claims that this genus originated in Europe (Kotthoff et al. 2013). However, further research and validation of these hypotheses are still needed.

There are also hypotheses about the location of glacial refugia for some species of honey bees. For instance, Wallberg et al. (2014) proposed that Africa or southern Europe is a glacial refuge for the Western honey bee. Their study on A. mellifera based on genome resequencing suggested that the African populations reached their maxima during glacial periods, whereas European populations peaked in size during interglacial periods (Wallberg et al. 2014). However, their claim lacks evidence from paleoclimate and niche modeling. Moreover, studies on refugia of other species of honey bees are largely lacking.

This study utilizes molecular biogeographic analysis and niche modeling methods to determine and validate the above hypotheses on the center of origin, glacial refugia, and diversification of honey bees.

2 Materials and methods

2.1 Ecological niche modeling

Three climate datasets at the resolution of 2.5 min were downloaded from WorldClim (http://www.worldclim.org) for niche modeling. These three datasets represent three climate periods, including the last glacial maximum (LGM; MIROC-ESM; v1.4), current (v2.0), and future (2070 year; MIROC-ESM). Global warming caused by greenhouse gas emissions is one of the assumptions of the future climate prediction model (Belda et al. 2016). There are 19 bioclimatic variables in each dataset. To reduce multicollinearity in the bioclimatic variables, the R package “raster” (https://cran.r-project.org/package=raster) was used to calculate the Pearson correlation index among them. The eight variables (bio02, bio04, bio05, bio06, bio08, bio15, bio16, bio19) with Pearson correlation indices of < 0.75 were retained for further analyses.

Occurrence records of all the nine species under the Apis genus were downloaded from the Global Biodiversity Information Facility (GBIF; https://doi.org/10.15468/dl.6nn7eb). These data were filtered to remove duplicates and records with geo-referencing errors. For A. mellifera, only the occurrence records within its native distribution range (Africa, Europe, Middle East, Central Asia, and western China) were retained for the analyses. A total of 29,828 unique occurrence records were obtained including reports from published literature (Chen et al. 2016; Chen et al. 2018; Sheppard and Meixner 2003; Yu et al. 2013). A. nigrocincta Smith, 1861; A. koschevnikovi Enderlein, 1906; and A. nuluensis Tingek, Koeniger, and Koeniger, 1996 were excluded from the ecological niche modeling analyses because their occurrence records are scarce. The distribution of occurrence for each species is shown in Figure 1. The program MaxEnt (Phillips et al. 2006) was used to model the ecological niche and to infer potential distributions. The R package ENMeval (Muscarella et al. 2014) was used to calculate the preferred combination of feature types and regularization multiplier with the minimum corrected Akaike information criterion (AICc). We randomly partitioned 75% of occurrence data for model training and the remaining 25% for model tests. The area under curve (AUC) value was used to evaluate model accuracy.

Figure 1.
figure 1

Distribution of the occurrence records of all the nine extant Apis species. The genus distribution was categorized into ten regions (A–J) using black dashed lines.

2.2 Phylogeny and divergence time estimation

The mitochondrial (mt) genome sequence data of all the nine species in Apis were downloaded from GenBank, and the accession numbers are shown in Figure 2. Bombus terrestris was used as the outgroup. Thirteen protein-coding genes were extracted and were aligned respectively using MAFFT 7.310 (Katoh and Standley 2013) with the iterative refinement method G-INS-i (accurate alignment). We pre-defined 39 partitions for the dataset according to different gene regions and 3 codon positions. Subsequently, PartitionFinder 2.1.1 (Lanfear et al. 2012) was used to find the best-fit partitioning schemes and models using a greedy search with AICc. The maximum likelihood (ML) tree was reconstructed using RAxML 8.2.10 (Stamatakis 2014) with the best-fit partitioning schemes and models inferred by PartitionFinder. Node reliability was obtained after 1000 bootstrap replicates.

Figure 2.
figure 2

Center of origin and diversification. (a) The divergence time tree. The bars at the nodes show the 95% highest posterior density (HPD) intervals of divergence time. The most likely ancestral regions are indicated by letters at nodes and corners. The ancestral regions at the corners are the immediate states after species divergence. The unit of time is million years ago (Ma). (b) Lineages-through-time plot. The solid line represents the mean value and the dashed lines along with it give the 95% confidence interval (CI). rt: diversification rate at some time t. Nt: number of extant species at some time t.

BEAST 2.6.2 software package (Drummond et al. 2012) was used to estimate divergence time. As calibration point, we employed the widely used divergence time of 6 to 8 million years ago (Ma) between A. mellifera and A. cerana (Chen et al. 2018; Wallberg et al. 2014). A Yule tree prior model and a lognormal relaxed clock model were applied to the analysis. It should be noted that only one individual was used for each species; otherwise, the parameter estimation would have been misled due to confusion between the speciation and population models (Drummond et al. 2012). Two independent runs were performed for 100 million generations, and tree samplings occurred every 10,000 generations. The trees produced by two runs were combined using LogCombiner 2.6.2 in BEAST packages. The first 25% of trees were discarded as burn-in. The trace files were analyzed using Tracer 1.6 (Rambaut et al. 2014) to get estimated parameter values (ESS > 200). The maximum clade credibility tree with the mean nodal height was generated by TreeAnnotator 2.6.2.

2.3 Ancestral region estimation

Basing on the studies of natural geographical boundaries and the biogeography of honey bees (Smith et al. 2000; Wallberg et al. 2014), the distributions can be categorized into ten regions (see Figure 1): (A) Africa, (B) Europe, (C) the Middle East, (D) India, (E) southern edge of the Tibetan Plateau (including the southern Himalayas and the Hengduan Mountains), (F) Indo-China Peninsula, (G) East Asia, (H) Sundaland (The narrowest part of Thailand is the boundary between F and H.), (I) the Philippines, and (J) Australasia zone. (The Wallace Line is the boundary between H and J.) We used the R package BioGeoBEARS (Matzke 2013) to estimate the ancestral distribution region. The dispersal-extinction-cladogenesis (DEC) model, likelihood versions of DIVA (DIVA-LIKE) and BayArea (BAYAREA-LIKE) models, and each model with an additional parameter j were tested. This parameter j mimicked the founder event of speciation (Matzke 2014). The best-fit biogeographic model was selected based on AIC. The BEAST divergence time tree was used as the input tree after excluding outgroups. The operational taxonomic units (OTUs) of the input time tree had to be species (each species represented by an individual) rather than populations or individuals; otherwise, it would have caused model confusion (Matzke 2014). The maximum number of areas at each node was set to seven. The combinations of non-adjacent areas were excluded from the estimation.

2.4 Diversification rate

The lineage through time (LTT) plot for the whole genus was generated using the R package phytools (Revell 2012) using the last 1000 trees from BEAST after excluding the outgroup. Birth-death likelihood (BDL) analyses (Rabosky 2006) were conducted on these 1000 trees to test the best-fit model with AIC. Diversification rate (r) is defined as r = bd, where b and d stand for speciation and extinction rates, respectively (Rosenzweig and Vetault 1992). We tested two rate-constant models (pure-birth and birth-death models) and four rate-variable models, in particular, the density-dependent exponential (DDX) model, the density-dependent linear (DDL) model, the Yule2rate model, and the Yule3rate model. The model with the lowest AIC was considered the best-fit model. The mean value and standard deviation (SD) for each estimated parameter were summarized from the results.

3 Results

3.1 Center of origin, dispersal, and diversification

The topology of the phylogenetic tree obtained in this study based on the mt genomes (Figure S1) are consistent with those in previous studies (Arias and Sheppard 2005; Koeniger et al. 2011; Lo et al. 2010; Raffiudin and Crozier 2007; Takahashi et al. 2018). Three main clades were uncovered. The cavity-nesting honey bees and giant honey bees were sister groups that together formed a larger clade, sister to the dwarf honey bee clade (Figure 2a). The monophyly of these three clades and the phylogenetic relationship among them received very strong node support (bootstrap value = 100) (Figure S1).

The optimal model for ancestral region estimation was DEC + j (Table I). The most recent common ancestor (MRCA) of all living species of this genus are inferred to be located in the Middle East, India, Indo-China Peninsula, and Sundaland (CDFH regions), and the ancestral node time is about 10 Ma (Figure 2a). These results suggest that the Apis genus originated from tropical Asia (excluding the Philippines) during the Miocene. The CDFH area is also the ancestral region of dwarf honey bees. The ancestral region of all seven other species is located in Sundaland (H region). It is also the ancestral region of cavity-nesting honey bees, Asian cavity-nesting honey bees, A. koschevnikovi, and A. nigrocincta (Figure 2a). These results suggest that Sundaland is the ancestral region of most clusters in the Apis genus. These ancestral populations from this region dispersed to the southern Qinghai-Tibet Plateau (E region) around 8 Ma, which was inferred to be the center of origin of giant honey bees. They further dispersed to a wider range in Asia (DEFGHIJ region) around 2 Ma (Figure 2a).

Table I Parameters for each model estimated by BioGeoBEARS

The best-fit diversification rate model was the DDL model (Table II), which sets the net diversification rate as rt = r0 × (1 – Nt/K), where r0 is the initial diversification rate, Nt is the number of extant species at some time t, and K is analogous to the “carrying capacity” parameter of population ecology (Rabosky 2006). Therefore, in this model, the diversification rate will gradually decrease with time. Substituting the formula with the estimated values (Table II) yields the equation: rt = 0.37 × (1 – Nt/10.39). This formula demonstrates that the diversification rate decreased rapidly from the initial value of 0.37 speciation events per million years (sp/Myr) to 0.05 sp/Myr at present. The full adoption of this model predicts that the number of Apis species will soon reach a limit and cannot increase anymore. This result is evident in the LTT plot, that is, the slope of the curve gradually decreased with time since around 5 Ma (Figure 2b).

Table II Results of birth-death likelihood (BDL) analyses

3.2 Glacial refugia and suitable future distribution region

The AUC values of the niche models for all the widespread species except for A. andreniformis Smith, 1858 were all greater than 0.9 (Table III), which indicated an excellent model fit. The inferred potentially suitable distribution region for each species under the current climatic conditions was roughly consistent with its actual distribution region (Figure 3). Although its native range does not cover these areas, it is worth noting that the ecological suitability of China and Japan for the Western honey bee is very high. In addition, the suitable distribution region of A. cerana in Australia is mainly located at the northern and northeastern edge of the continent (Figure 3).

Table III The area under curve (AUC) values for model training and testing
Figure 3.
figure 3

Potential suitable regions during the last glacial maximum (LGM), at current conditions, and in the future. Warmer colors indicate regions with higher occurrence probability.

The inferred suitable distribution region during the LGM period suggested possible glacial refuges for each species (Figure 3). The suitable regions of the Western honey bee during the LGM period were mainly located in central Africa and Sundaland. Considering its current native range, the glacial refuge of Western honey bee should be located in central Africa. Similarly, the glacial refugia of other species were as follows: A. cerana, southern China and Sundaland; A. andreniformis, Sundaland; A. florea Fabricius, 1787, Middle East and the southern Himalayas; and A. laboriosa Smith, 1871, the southern Himalayas and Hengduan Mountains. The suitable region of A. dorsata Fabricius, 1793 appeared to be less affected by the ice age.

This study also explored how the suitable distribution region of each species will change 20 years later (2070 year) due to global climate warming (Figure 3). The results showed that the suitable region for each species in the future may contract except for A. mellifera and A. dorsata. Among them, the reduction of A. cerana and A. andreniformis is particularly serious. Although the suitable region of A. mellifera may expand in central Africa, its extent in Europe will decrease, suggesting that future climate change may threaten the survival of A. mellifera in the continent. The suitable region of A. dorsata tends to expand northward in the future.

After removing the occurrence records of A. cerana and A. florea from their invasive ranges (Papua New Guinea and Australia for A. cerana, and Africa and Middle East for A. florea), the changes in the results were small. An exception was that of the inferred glacial refugia of A. florea which changed to the southern Himalayas, eastern China, and Sundaland (Figure S2).

4 Discussion

The results of this study support the hypothesis that honey bees originated from South and Southeast Asia (Dietz 1982) and furtherly amend it. The Philippines was excluded from the center of origin, and the Middle East was added to it. Sundaland was highlighted as the ancestral region of most lineages within the Apis genus, suggesting that the region is at the core position of the center of origin. Sundaland was also a biodiversity hotspot for honey bees as all Apis species apart from the Western honey bee were distributed in the region (Hepburn and Radloff 2011). The region may also have been a glacial refuge for A. cerana and A. andreniformis. All of these suggest that focus should be given to this area when formulating conservation policies for honey bees. Other studies have also suggested the significance of Sundaland in honey bee biogeography. For instance, a study based on mtDNA showed that the Sundaland populations of A. cerana constituted one of the four main groups of this species (Smith et al. 2000).

A fossil of the Miocene honey bee A. miocenica was discovered in Shandong, China (Hong 1983). Its discovery in the area alone cannot strongly support a honey bee origin in the ancient land in North China. This fossil simply indicates that the ancestors of honey bees had colonized the ancient land of North China in the Miocene. Moreover, this fossil species is not necessarily the lineal ancestor of Apis but may also be a collateral ancestor. The same reasons can be applied to the honey bee fossils found in Europe (Culliney 1983). For a robust honey bee center of origin hypothesis, it is necessary to obtain fossils from various parts of the world, and its sequences should be analyzed together for more reliable fossil evidence.

This study suggests that honey bees originated from tropical regions. Therefore, the existence of honey bees in the northern temperate zone should be the result of subsequent colonization and adaptation. The honey storage behavior of a colony is one of the key adaptive characteristics of honey bees to the temperate zone (Winston 1991). As the latitude increases, the temperature decreases and the winter duration becomes longer. The bee colony in temperate zone needs to hoard a large amount of honey in the short flowering season to provide the energy required during the winter (Winston 1991). The stored honey provides an alternative source and taste for human sugar requirements, making honey bees important economic insects. Interestingly, among the nine extant Apis species, only A. mellifera and A. cerana have expanded northward into the temperate zone. This may explain why large-scale apiculture is based on these two species.

This study suggests that the diversification rate within the genus gradually declines with time, and the global climate change in the future will have an impact on the potentially suitable regions of honey bees. This prompts us to pay more attention to the protection of honey bee diversity.

The results of this study advance our understanding of the origin and diversification of honey bees and are expected to provide a theoretical basis for honey bee conservation policies. This study is solely based on the analysis methods of biogeography and niche modeling. Further studies are needed to provide multi-method and multi-field evidence to illuminate the diversity of honey bees in the past, present, and future.