Background

Over the past few decades, the numbers of many insects have shown a rapid decline [1], including those of bumble bees [2], caterpillars [3], beetles [4], and honeybees [5]. The sources of insect survival pressure leading to these declines were abundant and included invasive species, habitat loss, pesticide abuse, environmental pollution, and climate change [6]. Of these factors, climate change, or its combination with all other factors, is more likely to be the main cause of large-scale global insect population and diversity declines [7,8,9,10,11,12]. However, evolutionary adaptations can potentially help species cope with survival pressure brought about by climate change [13]. Apis cerana is an important crop-pollinating insect in China. Its long collection cycle and ability to utilize sporadic nectar sources have enabled A. cerana to play a key role in the development of Chinese agriculture. During the long-term natural evolutionary process, many unique phenotypic variations have occurred among A. cerana populations in different regions under different climatic conditions [14, 15]. In general, moving from south to north in geographical distribution, the phenotype of A. cerana gradually shows a trend of weakening migration, slightly larger size, darker body color, and stronger cold resistance [16]. The characteristics of this adaptive radiation distribution under different climatic conditions show exceptional stress resistance, making A. cerana an excellent research subject for the study of climatic adaptation on adaptive evolution [17]. Thus, the study of the climatic adaptation of A. cerana can help effectively conserve A. cerana against climate change challenges. Simultaneously, understanding the genetic basis of these unique phenotypic variations is beneficial for their conservation and provides insight to better harness their genetic resources.

So far, many studies have used different methods to elucidate the basis of the unique phenotypic differences in A. cerana. Based on direct measurements of the phenotypic and morphological characteristics of A. cerana in different regions [18], nine varieties were classified. With the development of molecular biology technology, population genetics studies revealing genetic diversity based on sequencing the individual genomes of different bee species have been widely used [19,20,21]. Once the genome of A. cerana was published in 2015 (GCA_001442555.1) [22], genome research entered a rapid stage (GCA_002290385.1 and GCA_011100585.1) [23, 24]. Studies on the origin and evolution [14, 15, 17, 18, 25] and sequencing of A. cerana with special traits—A. cerana abansis (Aba strain) [26] and the endemic A. cerana with cold resistance [27]—progressed rapidly. These studies have also demonstrated distinct variations in DNA and microsatellites among A. cerana from different regions. A 2020 study identified the role of the leucokinin receptor (Lkr) in influencing the foraging division of labor [18]. However, existing research is far from sufficient to fully elucidate the climatic adaptation during the evolutionary process of A. cerana.

In this study, we collected 100 samples from 10 regions in China (covering the most important climate types) and utilized population genetics methods to identify the key traces in the process of adaptation to different environments at the genome level; we aimed to reveal the genetic basis of the strong environmental adaptability of A. cerana. Simultaneously, we chose collection sites at similar latitudes or longitudes to explore the influence of geographical coordinate factors on the adaptive radiation of A. cerana. Upon genetic structure, genetic differentiation, and population diversity analyses of these 10 populations, as well as through selective sweep analysis and morphometry analysis combination of the populations with high genetic differentiation, we were able to identify the key signaling pathways and genes involved in the process of adaptation to different climatic conditions in China. This study provides a useful reference for the improvement of A. cerana conservation and breeding.

Results

Sampling and sequencing

We collected and resequenced samples from 10 regions of China and included previously published resequencing data from three regions (Table 1, Fig. 1). A total of 429.71 Gb of raw data were obtained from 100 samples. The total data volume after quality control was 418.97 Gb, the average alignment rate with the reference genome was 90.98%, and the average sequencing depth was 19.16 × . The average Q20 was 96.65% and the Q30 averaged 91.59%.

Table 1 Information on sampling sites
Fig. 1
figure 1

Sampling sites of Apis cerana. Samples from regions in blue were obtained from previously published data

Population genetic structure analysis

The admixture model-based software (Admixture) was used to estimate the population structure. With K = 2, MK, JL, and WC formed an ancestral cluster, while the other populations showed different degrees of mixed lineage. As K increased from 3 to 5, MK, JL, and WC showed distinct lineages from the other populations, with ZZ and MY forming an ancestral cluster (Fig. 2). Simultaneously, we calculated the cross-validation error rate, which was the smallest when K = 4, of genetic structure analysis of 100 samples (Fig. S1). In particular, the SZ and QM groups remained indistinguishable when K increased from 4 to 9 (Fig. 2). The population subdivision pattern classified by principal component analysis (PCA) showed similar results. According to the first and second principal components, MK, JL, and WC were separated clearly (Fig. 3). The results of the group structure analysis and PCA of the 10 regional groups demonstrated that JL, MK, and WC could be separated from the remaining closely related and difficult-to-separate populations. The JL, MK, and WC groups were highly differentiated compared with the other groups. Based on the obtained SNP information, we used neighbor-joining methods to construct a phylogenetic tree (Fig. 4).

Fig. 2
figure 2

Analysis of population structure

Fig. 3
figure 3

Principal component analysis of the 10 populations

Fig. 4
figure 4

Phylogenetic tree of Apis cerana

Genetic differentiation and genetic diversity

To understand the genetic differentiation between the 10 groups, we calculated pairwise FST (Table 2) and genetic diversity parameters, including Ho, He, and PIC (Table 3). The maximum value of FST was 0.39 (between WC and JL), and the minimum value was less than 0.01 (a value close to 0, rounded to two decimal places is 0.00). These results also showed that JL, MK, and WC were in a state of high genetic differentiation (FST > 0.15), which is consistent with the analysis of population structure and PCA. JL displayed the highest degree of genetic differentiation (average FST = 0.28), followed by WC (average FST = 0.19) and MK (average FST = 0.16). The remaining regions were in a state of low genetic differentiation. The parameter calculation results for genetic diversity were greatest for SZ (PIC = 0.205) and GZ (PIC = 0.206) among the 10 populations. The genetic diversity of JL (PIC = 0.099), MK (PIC = 0.140), and WC (PIC = 0.148) was lower than that of other regions, indicating that bees in these three areas have been subjected to a higher intensity of natural selection.

Table 2 Pairwise FST distance between 10 populations of Apis cerana
Table 3 Genetic diversity parameters of Apis cerana

Historical effective population sizes

Based on the above population genetic structure analysis results, PCA, and genetic differentiation analysis, JL, MK, and WC have each independently become a subpopulation separated from the remaining populations. Moreover, we noticed that these three populations, in addition to a fourth group comprising the remaining seven populations, each occupy different regional climate types. Jilin is in the northeastern part of China and has a temperate continental monsoon climate; Mangkang is on the Qinghai-Tibet Plateau of China and has a typical plateau climate; Wenchang is on Hainan Island, China, with a unique tropical monsoon island-ridge climate; and the remaining populations are mostly in the subtropical monsoon climate region. The SZ group was selected as the representative subtropical monsoon climate region. To understand the important evolutionary events that occurred during the past adaptation of these four highly differentiated populations of A. cerana, we estimated their historical effective population sizes (Fig. 5). According to the estimated results, the four regions had nearly the same effective colony size during the period 80–200 ka years ago. About 70 ka years ago, the JL group began to show a change in population size that was different from that of the rest of the group. The population size of the three regions of MK, WC, and SZ in the middle and low latitudes showed an upward trend at this time, which seems to indicate that the population in these regions could begin to have continuous differentiation of population structure and mixed blood, resulting in an increase in population size, whereas the JL region did not undergo this process. Overall, the evolution of populations at high latitudes showed different trends than those at low and middle latitudes.

Fig. 5
figure 5

Estimation of historical effective population sizes

Morphometric analysis

We measured 10 morphological indicators related to body size on the samples collected from WC, MK, and SZ for morphometric analysis, including the right forewing length (FL), the right forewing width (FB), the sixth sternum length (L6), the sixth sternum width (T6), the third sternum length (S3), the third tergum length and the fourth tergum length (T3 + T4), the femur length (Fe), the tibia length (Ti), the basitarsus length (ML), and the basitarsus width (MT) (Fig. S2). The results of one-way ANOVA on the measured 10 indicators showed that, except for the insignificant difference between the MT of the SZ and WC groups (P > 0.05), all other indicators were significantly different among the three groups (P < 0.01) (Table 4 and Fig. 6). These results confirm that, in terms of body size, MK population is significantly larger than that of SZ, which in turn is significantly larger than that of WC.

Table 4 Summary morphological data of the three geographical populations
Fig. 6
figure 6

Boxplot of 10 morphological indicators of Apis cerana among the three geographical populations. A The right forewing length (FL). B The right forewing width (FB). C The sixth sternum length (L6). D The sixth sternum width (T6). E The third sternum length (S3). F The third tergum length and the fourth tergum length (T3 + T4). G The femur length (Fe). H The tibia length (Ti). I The basitarsus length (ML). J The basitarsus width (MT). Note: ** means significant difference at 0.01 level

Selective sweep analysis

To further study the adaptive radiation distribution, such as the difference in body size of A. cerana from different regions of China and the genome-level changes that occurred during the evolution of adaptation to the unique climate of each region under natural selection pressure, we estimated pairwise genetic differentiation (FST) and differences in nucleotide diversity (π ratios) from the four different types of populations to identify key selective sweeps. For the 40 samples from JL, MK, WC, and SZ, we performed the linkage disequilibrium (LD) decay analysis. The LD analysis results showed that when r2 was half of the maximum value, the decay rate was MK > JL > WC > SZ (Fig. S3). This also indicates that the bees in MK, JL, and WC were affected by more intense selection pressure than those in SZ; therefore, we selected the SZ region with richer genetic diversity and less selection pressure as the reference population. The JL, MK, and WC populations were scanned and analyzed against the SZ reference population to identify the regions in the A. cerana genome that have been selected under natural selection pressure, thereby revealing the adaptive evolution of A. cerana.

Selective sweep regions were chosen according to the intersection of two indices (FST and π ratios) with a threshold of the top 5% level. The results of the selection signal analysis of the highly differentiated populations under different climate types, with the SZ population (subtropical monsoon oceanic climate) as the reference population, identified 839 candidate regions involving 527 genes (Fig. 7A and Table S3) in the JL population (temperate monsoon climate). In the MK population (plateau climate), 589 candidate regions involving 565 genes were identified (Fig. 7B and Table S4). In the WC population (tropical monsoon island climate), 224 candidate regions involving 311 genes were identified (Fig. 7C and Table S5). In addition, 33 genes were identified in the selection signal analysis results for all three regions (Fig. 7D and Table S6). These 33 genes may play important roles in the adaptation of A. cerana to different climatic conditions in China. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed on these 33 candidate genes. The 33 shared genes were involved in 846 GO and 44 KEGG pathways, with 138 GO and 6 KEGG pathways significantly enriched at a threshold of P < 0.05 (Table S7 and Table S8). GO enrichment analysis revealed that the biological process with the most enriched genes was the single-organism process, the cellular component with the most enriched genes were cell and cell part, and the molecular function with the most enriched genes was binding (Fig. S4). The six signaling pathways were amino sugar and nucleotide sugar metabolism, peroxisome, RNA degradation, AMPK signaling pathway, ubiquitin-mediated proteolysis, and 2-oxocarboxylic acid metabolism. These results indicate that A. cerana responds to natural environmental stress primarily by regulating sugar and protein metabolism. In addition, among the top 20 enriched signaling pathways, we noticed some that may be related to the adaptation of honeybees to regional climates, such as the citrate cycle (TCA cycle) related to energy metabolism in the body, the thermogenesis pathway related to adaptation to changes in ambient temperature, and the Hedgehog signaling pathway involved in regulating gene expression (Fig. S5).

Fig. 7
figure 7

Selective sweep analysis against the reference SZ population. A Results of selective sweep analysis (SZ/JL). B Results of selective sweep analysis (SZ/MK). C Results of selective sweep analysis (SZ/WC). The selected intersect regions in red areas are at the top 5% level by FST and π ratios. D Venn diagram of selected genes

Discussion

Our study results reveal an important link between climate type and the adaptive evolution of A. cerana in China. A. cerana with unique phenotypic variations from different climate regions (Jilin, Mangkang, and Wenchang) could have independently formed a subpopulation from the analyses of the population structure, genetic diversity, and genetic differentiation based on whole-genome resequencing data. Furthermore, we confirmed the significant differences in the body size of A. cerana by comparing morphological data. This evidence shows that the JL, MK, and WC populations were highly differentiated compared with other populations, which may be a consequence of different climate types and environmental selection pressures. This is in line with the findings of a study on the adaptive evolution of A. cerana in which isolated groups showed a high degree of differentiation and lower genetic diversity, whereas other groups were less differentiated and had higher genetic diversity [14]. In the present study, we found that the average PIC of the 10 populations was 0.17, and the PIC values of all populations were lower than 0.5, representing low polymorphism and low genetic diversity of A. cerana. Human activities are considered to be an important reason for the decrease in the genetic diversity of A. cerana in recent years. Human activities led to the reduction of the habitat area of A. cerana, which directly led to the reduction of wild resources of A. cerana. A large reduction in the number of wild bees leads to a decrease in the effective population, which may directly lead to a decrease in the genetic diversity of A. cerana. In addition, the introduction of A. cerana from other regions also affected the original genetic structure and genetic diversity of local populations. A rich genetic diversity can enhance the adaptability of species to their environment, more stably maintain population balance [28, 29], and improve production performance [30]. Therefore, it is imperative to protect and rationally utilize the genetic resources of A. cerana.

It is also worth noting that the geographic latitude factor had a high degree of influence on the biodiversity of biological groups. This is supported by the estimations of the effective population size. The different change patterns of the effective population size of A. cerana at high and low latitudes also reflect the different impacts of sudden climate change on populations at different latitudes. Our study shows that this discrepancy first originated around 70 to 80 ka years ago. Since the last glacial period of the Quaternary Ice Age [31, 32], different degrees of cooling have occurred throughout China, where the temporal and spatial differences in the cooling rates are remarkable. A significant feature is that cooling is greater in winter than in summer and at high latitudes than at low latitudes. This widespread cooling may have prompted the migration and fusion of high-latitude regions with low-latitude regions. During this process, species at high latitudes could be subjected to stronger natural selection than those at low latitudes.

The main driving forces in the process of species evolution include selection and genetic drift. The joint action of multiple forces may not only make the mutant increase its frequency faster, but also may be that genetic drift slows down the speed of selection, or even reduces the frequency of the mutant. The intensity of genetic drift depends on the effective population size. The larger the effective population size, the weaker the genetic drift; the smaller the effective population size, the higher the probability of genetic drift. According to our estimation of the effective population size, the effective population size of A. cerana in Jilin is lower than that of other populations. This means that A. cerana in Jilin may have experienced serious bottleneck events, and the evolution is more affected by genetic drift. This is consistent with the previously reported situation that human capture has greatly changed the population dynamics of A. cerana in the area around JL in recent decades [22, 25]. Therefore, in order to reduce the false positives caused by the influence of genetic drift and other factors on the SNP scan, we increase the sensitivity of the selection signal by calculating the FST between populations in the sliding window.

In this study, we mainly focus on the relationship between genetic variations and climatic conditions. Hence, we pay attention to the factors like latitude, longitude, and altitude related to climate conditions. It is worth noting that in the process of adaptive evolution, especially in the case of short generation interval of bees, the impact of artificial domestication cannot be ignored. Both the changes in genome structure characteristics caused by artificial selection and natural selection are called selection signals. For these selection signals, it is difficult to distinguish whether their source is artificial selection or natural selection. Therefore, when using the selective sweep analysis to analyze the evolution of species, it is necessary to select appropriate research objects according to the direction of focus. We selected A. cerana as the research object rather than Apis mellifera to study climatic adaptation. Because the domestication history of A. cerana is shorter than that of A. mellifera, and the degree of domestication is weaker than that of A. mellifera. In addition, during the sampling process, the samples we collected were A. cerana living in different geographical environments in the wild or semi-wild state. These samples are relatively less affected by artificial domestication. We believe that such samples are suitable for studying climatic adaptation during the adaptive evolution of A. cerana, and the analysis results based on these samples are representative.

Long-term natural selection often causes changes in the allele frequencies of selected loci and their linked loci. Upon analyzing these traces of natural selection in the genome, we can better understand some of the important events in evolution. We used selective sweep analysis to determine the relationship between climate type and the adaptive evolution of A. cerana. Positive selection will lead to an increase in the frequency of the dominant allele at the locus, which in turn will lead to a decrease in the genetic polymorphism of the selected locus. Therefore, for the studied colonies of A. cerana under the four climatic types, we set the SZ population as the reference, based on the calculated population polymorphism (SZ > WC > MK > JL). An abundance of polymorphisms means that the selection pressure received is less; hence using the SZ population with less selection pressure as the background facilitates the detection of traces of selection in the remaining populations. In addition, we use different selective signal methods (FST and π ratios) for selective signal detection. The adoption of an overlapping detection strategy can further avoid the occurrence of false positives through mutual verification between methods to a certain extent. It is undeniable that the results obtained after these pairwise comparisons are relatively rich, but not all signals are related to adaptation to the climate environment and body size; therefore, we pay more attention to the signals detected in all three selective sweep analyses. These signals will help gain insights into the climatic adaptation mechanisms of A. cerana. According to the results of GO and KEGG pathway analyses, we speculate that the adaptive evolution of A. cerana may be preferentially reflected in its response to the abundance of food resources and changes in temperature. A. cerana actively regulates its metabolism to cope with food shortages in harsh environments and maintain its body temperature. This also indicates that A. cerana energy utilization patterns may differ in different climate types.

Based on our findings, we speculate that the selection of RAPTOR could help A. cerana adapt to the complex climatic environment in China. A total of 33 candidate genes related to climatic adaptation were screened using selective elimination analysis. Upon annotation results combination with existing research reports, we identified RAPTOR as a candidate gene related to body size in climate adaptation. RAPTOR is the intersection of the AMPK and mTOR signaling pathways. The evidence to date suggests that the mTOR signaling pathway is involved in the development of A. mellifera [33,34,35,36]. Studies have shown that the mTOR signaling pathway plays a key role in the development of the queen bee [37,38,39,40,41], and the knockdown of the TOR encoding gene can affect the fate of the queen [37]. Under nutrient deprivation, AMPK, an important kinase that regulates energy homeostasis, directly phosphorylates RAPTOR to inhibit the mTORC1, thereby inhibiting cell growth [42]. The mTORC1 participates in glucose metabolism indirectly in response to insulin secreted by glucose entering the blood or is directly activated by amino acids [43, 44]. The mTOR pathway is also involved in regulating autophagy. When insufficient nutrient supply causes insufficient mTORC1 activity, mTORC1 initiates the inhibition of autophagy and activates lysosomes to degrade relatively uncritical proteins and organelles to provide material and energy to maintain the basic survival needs of cells [45, 46]. Bees absorb nutrients (glucose and amino acids) by feeding on plant pollen and nectar [23]. Native plants are affected by local climatic conditions, exhibiting shorter flowering periods and lower abundance in colder regions than in warmer regions [26, 47]. As a result, the degree of difficulty in finding food and absorbing nutrients for A. cerana in different regions is inconsistent. The selection of RAPTOR at the genome level could be in response to changes in climate and food sources. Early studies also reported the functions of the RAPTOR and TOR pathways in Drosophila [43, 46, 48, 49]. Decreased TOR activity inhibits ecdysone release and leads to prolonged development time and increased body size, whereas activating TOR can reverse stunting caused by nutrient restriction [50]. In addition, RAPTOR was upregulated in a study of cold resistance in Drosophila [51]. A recent study on thermal and oxygen flight sensitivity in Drosophila showed that downregulation of TOR activity produces smaller flies with smaller wing epidermal cells, and flies with small cells can maintain superior performance in metabolically demanding activities, such as flying under hypoxic conditions [52]. Whether A. cerana has adopted the same strategy as other insects to combat different climate types requires further investigation.

In summary, our findings suggest that climate type, geographic location, and food resources are all related to the adaptive radiation distribution and unique geographic phenotypes of A. cerana. It is highly probable that food and nutrition deficits caused by diverse climates are the predominant driving forces for unique geographic phenotypes. Further research is necessary to verify the molecular mechanism between RAPTOR and the body size.

Conclusions

Our study shows that the genetic structure and genetic differentiation of A. cerana are related to the distribution of climate and environment in China and are strongly affected by latitude. A key gene, RAPTOR, plays an important role in this process. The selection of RAPTOR at the genomic level helps A. cerana respond to nutritional problems caused by harsh environments by modulating TOR activity to alter body size, which partly explains the difference in body size among A. cerana distributed in China. These results help us understand the genetic basis of the climatic adaptation of A. cerana and provide a reference for the protection and utilization of germplasm resources and future genetic improvement.

Methods

Sample collection

A total of 100 honeybee samples were studied in this experiment, 70 of which were collected from Tibet, Hubei, Anhui, Jiangsu, Guangdong, and Hainan provinces in China. The collection sites for each sample were centered in Suzhou, Jiangsu, and were located at similar latitudes and different longitudes or similar longitudes and different latitudes. In addition, 30 samples from sites with similar longitude and different latitude as Suzhou were included for comprehensive data analysis [14, 18]. Ten colonies of 3–5 bees were collected from each sampling point. The collected bee samples included wild and semi-wild local bees. Samples were placed in 75% alcohol and then stored at -20 °C for future use.

Whole-genome sequencing, quality control, and clean reads mapping

Total genomic DNA was extracted from samples, and at least 3 µg genomic DNA was used to construct paired-end libraries with an insert size of 500 bp using a Paired-End DNA Sample Prep kit (Illumina Inc., San Diego, CA, USA). These libraries were sequenced using the HiSeq X10 NGS platform (Illumina Inc., San Diego, CA, USA). Raw reads were processed to obtain high-quality clean reads according to two stringent filtering standards: 1) removing reads containing > 50% of low-quality bases (Q < 20) or > 10% unidentified nucleotides (Ns); and 2) removing reads aligned to the barcode adapter. The Burrows-Wheeler Aligner (BWA) was used to align the clean reads against the reference genome (GCA_011100585.1) with the settings ‘mem 4 -k 32 -M’ [53]. Duplicates were marked using Picard 2.18.7 (http://broadinstitute.github.io/picard/).

Variant identification and annotation

Variant identification was performed using the Genome Analysis Toolkit (GATK) [54]. SNPs were filtered using GATK Variant Filtration with proper standards (-Window 4, -filter "QD < 2.0 || FS > 60.0 || MQ < 40.0", -G_filter "GQ < 20"). SNPs were filtered according to two stringent filtering standards: 1) missing ratio < 20%; and 2) minor allele frequency (MAF) > 5%. The ANNOVAR software [55] was used to annotate SNPs.

Principal component analysis (PCA), population structure analysis, and phylogenetic analysis

The admixture model-based software, Admixture V1.3.0 [56], was used to estimate the population structure. The tested K was set from 1 to 9, and the optimal K was determined based on the lowest cross-validation error. The population subdivision pattern was preliminarily classified using PCA in the GCTA software [57]. We constructed a phylogenetic tree using the neighbor-joining method with TreeBeST software [58]. The bootstrapped confidence interval was based on 1000 replicates.

Genetic diversity and FST statistics

Ho and He for each group or population were calculated using Plink [59]. Ne and PIC were calculated using the Perl script. The pairwise FST matrix was calculated using Genepop software [60].

Historical effective population sizes

SMC +  + 1.13.1 [61] was used to estimate the changes in Ne over the past one million years (Ma). The mutation rate was set to 5.27 × 10−9 per base pair per generation, following a divergence estimate of 7 Ma between A. mellifera and A. cerana [62]. The generation time was assumed to be one year. The polarization error was set to 0.5.

Morphometric analysis

A total of 130 worker bees (30 from MK, 50 from WC, and 50 from SZ) were used for morphometric measurements. The bees were dissected to make samples of each tissue, observed and photographed under a stereo microscope, and measured by a measurement system (M-Shot Image Analysis System V1.1.4). The measure of each sample in parallel was repeated 3 times and the mean value was taken.

Linkage disequilibrium (LD)

To evaluate the LD pattern, we estimated the squared allele frequency correlation (r2) using Haploview 4.2 [63]. LD decay graphs were plotted using the R script.

Selective sweep analysis

For the 40 samples from JL, MK, WC, and SZ, we performed selective sweep analysis, with SZ as the reference population. We estimated pairwise genetic differentiation (FST) [64] and differences in nucleotide diversity (π ln-ratio) [65] from the four different populations to identify key selective regions using the PopGenome software [66] with the sliding window approach. We set the window size to 100 kb and the step size to 10 kb. Selective sweep regions were selected according to the interception of two indices (FST and π ratios), with a threshold of the top 5% level. All related graphs were drawn using R scripts. Candidate genes within sweep regions were extracted for GO and KEGG enrichment analysis.