Background

The Qinghai-Tibet Plateau, known as the roof of the world, has unique environmental conditions, including high altitude, low temperature, large temperature difference and high ultraviolet radiation, which make the unique organisms in this area ideal models for studying the molecular mechanism of plateau adaptability [1,2,3,4,5,6,7]. Studies have shown that fish in the Qinghai-Tibet Plateau have undergone adaptive evolution [8,9,10,11]. Compared with other fish, their evolutionary rates are significantly accelerated in Gene Ontology (GO) terms such as antioxidant stress, vascular morphogenesis, glucose metabolism, and DNA repair. Many genes related to the hypoxia response and energy metabolism were selected [12,13,14,15,16]. As a dominant group, the distribution and evolution of Schizothoracinae fishes were closely related to the uplift of the Qinghai-Tibet Plateau, which has attracted attention to the fields of phylogenetic, evolutionary and extreme environmental adaptation mechanisms [17,18,19]. Previous studies have shown that Schizothoracinae fishes form a new species adaptive ecological niche in the form of sympatric speciation or allopatric speciation under geographical barriers and natural selection, but there are few empirical cases and further exploration is needed [20, 21].

Schizothorax o’connori is an endemic species in China that is widely distributed in the Yarlung Zangbo River and has great ecological and economic value. Previous studies have found that the uplift of the Tibetan Plateau has led to changes in the water system of the plateau rivers, and the Palong Zangbo River, a tributary of the lower Yarlung Zangbo River, may have been formed due to river capture [22, 23]. The Palong Zangbo River population of S. o’connori may also originate from the upper reaches of the Yarlung Zangbo River Grand Canyon, but there is no evidence of gene exchange [22]. As a young tetraploid fish, S. o’connori experienced the fourth round of genome-wide replication recently, and its evolution rate was higher than that of most Schizothorax fishes [1, 24]. Because of the existence of Yarlung Zangbo Grand Canyon, whether there is a difference of the S. o’connori population in the upper and lower reaches of the Grand Canyon, and S. o’connori has evolved independently in different water environments to adapt to the environment is not clear. Furthmore, due to the intensification of human activities, invasion of alien species and construction of water conservancy and hydropower projects, the natural population of S. o’connori has decreased sharply in recent years [25,26,27]. At present, research on S. o’connori has mainly focused on age and growth, liver metabolism and pigment distribution [28,29,30,31,32]. However, there is no genome-wide study on the population genetics of S. o’connori, which was based only on mtDNA sequences and microsatellite markers to study its genetic diversity [22, 23]. Therefore, it is necessary to further explore the differentiation evidence of S. o’connori at the genomic level, providing a reference basis for the parallel evolution research of plateau fish.

In recent years, with the publication of more fish genome-wide data, whole genome resequencing has been widely used. It not only analyses the genetic diversity, phylogenetic relationship and population historical dynamics of different fish populations [33,34,35], but also screens out the genes and pathways related to environmental adaptation, which could also better elucidate the genetic basis and molecular mechanism of environmental adaptation of fish [36,37,38,39,40]. Whole-genome data for S. o’connori have been published, and the genome and transcriptome sequencing data were submitted to the National Biotechnology Information Center (NCBI) biological project number PRJNA557578 [24]. In the present study, the genetic diversity, population structure and environmental adaptation mechanism of seven wild populations of S. o’connori in the middle and lower reaches of the Yarlung Zangbo River were analysed by whole-genome resequencing. The results of this study can provide an important basis for resource protection and environmental adaptation of S. o’connori.

Results

Data quality control

From the Illumina NovaSeq 6000 sequencing platform, a total of 849.5 Gb raw reads and 803.9 Gb clean reads were obtained from 53 samples. The average sequencing depth was 10.99 X. The average depths of 1X, 5X, and 10X were 84.89%, 65.46%, and 43.65%, respectively. Xiao et al. [24] shared the assembly results and annotation information of the genome sequencing data with us, which was referred to as the reference genome of S. o’connori. Then, the average MapPERate ratio of reads to the reference sequence for comparison was 93.27%, and the average ProperlyRate ratio of reads to the reference sequence for correct alignment (insertion of fragments, direction, etc.) was 78.59%. The average coverage rate (coverage) of the reference sequence was 90.72%. The average Q20 was 97.40%, and the average Q30 was 91.42% (Supplementary Table 1).

Cluster analysis of differential SNPs and InDels

A total of 23,602,746 SNPs (single nucleotide polymorphisms) and 6,671,199 InDels (insertion/deletion polymorphisms) were identified by mutation detection and screening. The results showed that the transition (AT → GC or GC → AT) was the main mutation type (Fig. 1A). In addition, most SNPs were in intergenic (59.55%) and intron regions (31.97%), followed by exons (3.38%), downstream (2.46%), and upstream regions (2.46%) (Table 1). The minor allele frequency (MAF) analysis showed that SNPs and InDels were mainly distributed on smaller MAFs (Fig. 1B, C). Both SNPs and InDels had high density distributions on chromosome 2 and chromosome 23, which implied that these two chromosomes may play an important role in evolution (Fig. 1D, E).

Fig. 1
figure 1

SNP and InDel information. A. SNP marker type distribution; B. Minor allele frequency (MAF) distribution of SNPs; C. MAF distribution of InDels; D. Density distribution of SNPs within 1 Mb on the chromosome; E. Density distribution of InDels within 1 Mb on the chromosome. A, G, C and T represent adenine deoxynucleotides, guanine deoxynucleotides, cytosine deoxynucleotides and thymine deoxynucleotides, respectively. Chr1-24 represents chromosomes 1–24; SNPs represent single nucleotide polymorphisms; InDels represent insertions and deletions; and Mb represents megabase

Table 1 Number of SNPs and InDels in different regions of the genome

Population genetic diversity analysis

According to SNP data, the values of HO, HE, π and PIC were 0.1878–0.2167, 0.1414–0.1638, 0.0019–0.0020 and 0.01143–0.0348, respectively (Table 2). The average values of HO, HE, π and PIC in the LZ population were the lowest, indicating that genetic diversity was the lowest. The HO, HE and PIC of the ZG population were larger than those of the other populations, and π was only smaller than that of the ML population, indicating that the ZG population had the highest genetic diversity. The genetic diversity of the DG, JC and ZM populations was similar, while the BM population was closest to that of the DG population.

Table 2 Genetic diversity parameters among Schizothorax o’connori populations

The range of the genetic fixation index (Fst) between the two populations was 0.0132–0.0437 (Table 3). The Fst of the ML population was relatively higher than that of the other six populations, ranging from 0.0346 to 0.0437. Next was the LZ population, with an Fst between 0.0230 and 0.0437 compared to the other six populations. The genetic differentiation between the seven populations was very weak.

Table 3 Genetic fixation index between seven populations of Schizothorax o’connori

Systemic development and population structure

According to the NJ phylogenetic tree analysis, the seven populations crossed each other and could not be obviously clustered into a single branch (Fig. 2). However, focusing on each population, the ZG population was completely separated from the BM, ML, and LZ populations, while there was only one individual staggered with the JC population. The ML, LZ and DG populations were completely separated and not clustered on the same branch. The JC and ZM populations were dispersed in other populations, especially the JC populations, which intersected with each population.

Fig. 2
figure 2

The neighbourhood-joining phylogenetic tree of the Schizothorax o’connori system. ZG, Zangga; DG, Dagu; ZM, Znagmu; JC, Jiacha; ML, Milin; LZ, Linzhi; BM, Bomi

Phylogenetic analysis showed that the populations from the upstream (ZG population), downstream (ML and LZ populations), and tributary Palongzangbu (BM population) of the sampling section were differentiated into two genetic lineages. DG, ZM and JC in the middle reaches of the sampling section were scattered in two genetic lineages, especially the JC population. This hinted at a phylogenetic relationship related to geographical location.

According to the clustering situation at different K values, the consistency between individuals in different subgroups and the river basin where the sampling point was located was not strong. Only when there was one ancestor (K = 1) could 53 samples be clustered together (Fig. 3A). Furthermore, considering the cross-validation error rate of different K values (Fig. 3B), the results were selected when K = 1 as the clustering criterion; that is, the 53 samples in this experiment were a single population at the level of population genetic structure analysis.

Fig. 3
figure 3

Principal component analysis and the error rate of the Schizothorax o’connori admixture K value. A The error rate of the S. o’connori admixture K value by cross − validation; B the clustering results of S. o’connori; C Principal component analysis for S. o’connori. ZG, Zangga; DG, Dagu; ZM, Znagmu; JC, Jiacha; ML, Milin; LZ, Linzhi; BM, Bomi

PCA analysis

PCA showed that the contribution rates of the second principal component (PC2) and the third principal component (PC3) were 30.10% and 29.80%, respectively (Fig. 3C). Then, seven groups could not be separated. According to the output of PCA parameters, the 53 samples of this project could not be clustered into different subgroups at the level of group principal component analysis.

Gene flow and linkage disequilibrium analysis

Based on the results of genetic differentiation and cluster analysis, only the gene flow and linkage disequilibrium analysis of ZG, BM, LZ and ML were performed here (Fig. 4A). Gene flow analysis showed that when one migration event or two migration events occurred, only the ML population gene flowed to BM among the four populations, and no gene flow occurred among other populations, which was similar to the results of cluster analysis. This implies that the ML and BM populations originated from a single ancestor.

Fig. 4
figure 4

Further analysis of the four groups and redundancy analysis. A gene flow analysis; B linkage disequilibrium analysis; C estimation of historical effective population size for four populations; D redundancy analysis based on six factors (Bio1: annual mean temperature; Bio3: isothermality; Bio12: annual precipitation; Bio13: precipitation of wettest month; Bio15: precipitation seasonality and altitude). ZG, Zangga; DG, Dagu; ZM, Znagmu; JC, Jiacha; ML, Milin; LZ, Linzhi; BM, Bomi

LD analysis was conducted for four populations. The LD coefficients of the BM and ZG populations at 0.8 kb on the genome were approximately 0.13, the ML at 2.2 kb on the genome was approximately 0.28, and the LZ at 1.8 kb on the genome was approximately 0.23 (Fig. 4B). The decay rate of the LD coefficient in the four populations was BM = ZG > LZ > ML. The decay distance of the BM and ZG populations was the smallest, which may be due to the high genetic diversity of these two populations. This was consistent with the results of the genetic diversity analysis.

Effective population size analysis

The historical effective population size and variation trend of ML and LZ populations were basically consistent, and the BM and ZG populations were consistent (Fig. 4C). About 8.0 million years ago, the effective population size of ML, LZ, BM, and ZG populations was still at a relatively high level. However, approximately 8.0–1.0 million years ago, the overall population showed a downward trend. The BM, ZG, LZ and ML populations experienced rapid expansion at approximately 1.0–0.13 Ma, 1.0–0.13 Ma, 0.18–0.019 Ma and 0.14–0.019 Ma, respectively. Then, the effective population size of the BM and ML remained almost unchanged from approximately 0.019 to 0.01 Ma. Conversely, the ZG and BM populations rapidly decreased from approximately 0.013 to 0.01 Ma and 0.13 to 0.015 Ma, respectively. Otherwise, the BM population slowly expanded from 0.015 to 0.01 Ma.

Genome-environment association analysis and functional annotation of selected SNPs

Five climatic factors (Bio1: annual mean temperature; Bio3: isothermality; Bio12: annual precipitation; Bio13: precipitation of wettest month; Bio15: precipitation seasonality) and altitude factors were selected for genome-environment association analysis (GEA). The results showed that the variance explanatory rates of the first two components were 22.30% and 17.01%, respectively, while the six factors revealed the significant statistical differences with genotype (P = 0.001, R2adj = 0.3277) (Fig. 4D). Among them, altitude factors have the greatest correlation with population genetic distribution. On the plateau, altitude was closely related to temperature and ultraviolet radiation, which provides a reference for the analysis of selected sites in the future.

Based on lnθπ ratio and FST value, the functional annotation of selected SNPs was analysed through the KEGG signalling pathway and GO terms (p ≤ 0.05) (Fig. 5, Table 4). In the pairwise comparison of all populations, 77 significant KEGG pathways were identified, and the most common pathways contained olfactory transduction, the Fanconi anaemia pathway, and glycine serine and threonine metabolism (Supplementary Table 2).

Fig. 5
figure 5

The top 30 KEGG pathways significantly enriched in Schizothorax o’connori

Table 4 Significantly selected pathways and genes among Schizothorax o’connori populations

A total of 86 significant GO terms were determined from pairwise comparisons of all populations, such as olfactory receptor activity, cellular protein modification process, peptidase activity and ubiquitin-protein transferase activity (Supplementary Table 3).

Extended haplotype homozygosity between the populations

Based on the XP-EHH values of each locus combined with p values, the top 5 regions were selected as candidate regions. The candidate genes in this region were enriched and analysed (Supplementary Table 4 and 5). The analysis result of XP-EHH was similar to that of selective clear analysis. The selected items were mostly related to DNA repair and energy metabolism. Only one example is shown here. Comparing the BM and ML populations, more than 700 candidate genes were extracted in the top 5 regions. The KEGG and GO functional enrichment items mostly included fucosyltransferase activity, nucleosome, endopeptidase inhibitor activity, and nucleosome assembly (Figs. 6 and 7).

Fig. 6
figure 6

GO functional enrichment results of candidate genes in the Bomi (BM) vs. Milin (ML) populations

Fig. 7
figure 7

KEGG functional enrichments of candidate genes in the Bomi (BM) vs. Milin (ML) populations

Discussion

Maintaining genetic diversity within species and populations is important for their long-term survival and health, as it provides the ability to adapt and change according to environmental conditions [41,42,43,44]. The average level of heterozygosity and genetic diversity is positively correlated with environmental adaptability [43, 45]. In this study, the π of the seven populations of S. o’connori was low, and the PIC belonged to weak polymorphism sites. However, HO (0.2107) and HE (0.1577) were higher than those in S. kozlovi (HO 0.09578 and HE 0.06743) [46] but lower than those in most schizothoracids, such as S. lissolabiatus (HO 0.2695 and HE 0.2892) [47] and S. curvilabiatus (HO 0.2489 and HE 0.2931) [48]. This may be because S. o’connori has been in a stable state after experiencing bottleneck effects and has not undergone population expansion, thus exhibiting a low pattern of genetic diversity [49,50,51,52]. This was similar to the reason for genetic diversity scarcity in the Schizopygopsis pylzovi population [53]. In addition, changes in the aquatic environment and the decrease in population resources caused by overfishing might also be the reasons for the low genetic diversity of S. o’connori [25, 26]. Therefore, it is necessary to increase monitoring and protection of S. o’connori, especially the LZ population.

Fst and cluster analysis can determine the level of genetic differentiation and genetic relationships between different populations [54, 55]. Natural fish populations are gradually differentiated due to gene exchange between populations because of geographical isolation. For example, there was no barrier between the habitats of the Taiwan and Guangdong populations of the oblique banded grouper, but the straight-line distance was approximately 600 km, indicating geographical differentiation between the two populations [56]. Based on the classification standard of Fst values, the genetic differentiation between the seven populations of S. o’connori was relatively small (0 < Fst < 0.05). However, the ZG population was completely separated from the LZ, ML and BM populations but intersected with DG, ZM and JC, which may be due to geographical isolation. In this study, the ZG population was approximately 340 km away from the ML population and 630 km away from the BM population. There were differences in the habitat environment between different populations at long distances and a long-term lack of communication, leading to such population differentiation. Although there was a dam barrier between ZG and DG and between ZM and JC, the dam construction time was relatively short (construction started between 2010 and 2020) [57]. Thus, it was not the current cause of group differentiation, but it may lead to group differentiation upstream and downstream of the dam in the long term.

Numerous studies have shown that ancient climate and geological events might be important factors affecting the current geographical distribution and genetic differentiation of species in the Qinghai-Tibet Plateau [58,59,60,61]. In this study, the effective population of S. o’connori decreased for a short time in the late accelerated uplift stage of the Qinghai Tibet Plateau (~ 1 million years ago) and the Quaternary Ice Age [24, 62, 63]. Therefore, we speculate that the geological movement and temperature decrease during this period brought great pressure on S. o’connori survival, which led to a sharp decline in the population. In addition, the Palong Zangbo River may not be a tributary of the lower reaches of the Yarlung Zangbo River approximately 4 megaannus (Ma) ago but a part of the main stream of the lower reaches [23]. Before the ancient Yarlung Zangbo River was captured, Palong Zangbo belonged to the lower reaches of the main stream of Yarlung Zangbo River, which was connected with the upper reaches of the Grand Canyon, so there was extensive gene exchange among S. o’connori in this section without geographical barriers [23, 64,65,66]. With the formation of the Grand Canyon, geographical barriers, such as waterfalls, allowed populations upstream of the adjacent Grand Canyon to flow downstream and undergo gene exchange with downstream populations. However, the Palong Zangbo River population could not undergo gene exchange upstream of the Grand Canyon [23, 67], which greatly explains the communication phenomenon between the BM and ML populations in this gene flow and evolutionary tree analysis. Meanwhile, this also confirms that our hypothesis that the Parlung Tsangpo colony population (Bomi) evolved independently was not valid.

In addition, the Cytb + Control region sequences were utilized to explore the genetic results of different S. o’connori populations, and significant genetic differentiation was found between the BM population and the upstream Grand Canyon population [22]. However, this study found that the BM population only showed differentiation from the ZG population at the genomic level, suggesting that the genetic differentiation of different populations at the molecular level was gradual.

Revealing the genetic basis of plateau fish for extreme environments could further our understanding of their adaptive evolution under environmental changes [68,69,70]. In this study, we utilized selective clearance analysis and XP-EHH to determine the potential features of high-altitude adaptation in S. o’connori. Based on the two analysis methods, the selected SNP sites were mostly concentrated in pathways such as olfactory transduction, the Fanconi anaemia pathway, and amino acid metabolism, which were mostly related to DNA repair and energy metabolism.

In extremely high-altitude environments, both low temperature and ultraviolet radiation can cause DNA damage [71,72,73,74]. DNA damage repair plays an important role in maintaining DNA integrity and stability. The GO enrichment analysis of positively selected genes in S. malacanthus and S. pylzovi was mainly related to DNA repair, which helps them adapt to high altitude and strong ultraviolet radiation environments [75]. Pathways related to DNA repair, such as homologous recombination and the P53 signalling pathway, were selected in Trilophysa bleekeri, forming an integrated DNA repair mechanism to cope with extremely high-altitude environments [76]. The reptiles [77], mammals [78], and birds [79] living on the Qinghai Tibet Plateau have also undergone adaptive evolution in DNA repair. The above results suggest that high-altitude animals have similar adaptive convergent evolution in terms of DNA damage repair function.

Fish undergo adaptive evolution for energy metabolism in environments with low water temperatures and large temperature differences between day and night throughout the year [80,81,82]. Gymnodiptychus pachycheilus exhibits accelerated genome evolution, and genes exhibiting rapid evolution and positive selection characteristics in its lineage enrich functions related to energy metabolism [83]. Compared to plain fish such as Ctenopharyngodon idellus at different altitudes, the dN/dS values of all schizothoracine fishes were significantly increased, and the evolution rate of some GO items related to energy metabolism, hypoxia response, and DNA repair related to altitude adaptation was also significantly accelerated [8]. Thus, the fish on the Qinghai-Tibet Plateau have undergone a rapid evolutionary process. Fish living in other cold regions have similar evolutionary strategies. For example, to adapt to the cold and highly seasonal Antarctic environment, transposable elements from intergenic Antarctic krill have expanded and formed a large genome; moreover, the gene family related to moulting and energy metabolism has also expanded [84]. In this study, we found that the selected genes of S. o’connori were enriched in pathways such as amino acid metabolism and sugar metabolism. Amino acid and sugar metabolism are important metabolic pathways that play an essential role in regulating metabolism and maintaining energy needs. This signifies that S. o’connori has undergone adaptive evolution in high-altitude, low-temperature environments.

Determining conservation order and units based on the genetic diversity of different species populations is essential for proposing targeted conservation strategies [85]. It is generally believed that populations with higher diversity have greater evolutionary potential, better adaptability to environmental changes, and higher conservation value [22]. In this study, seven populations of S. o’connori did not have significant differentiation. However, in the evolutionary tree analysis, the ML, LZ and BM populations were more concentrated, and the other four populations were less concentrated. Therefore, we propose to divide seven populations into two management units: the BM, LZ, and ML groups near Grand Canyon as one management unit and the ZG, DG, ZM, and JC groups as another management unit. Moreover, among the seven populations, the ZG population had the highest genetic diversity and could be prioritized for protection. We suggest establishing small-scale S. o’connori reserves, including strengthening fishery supervision and habitat assessment in these areas [86]. For other populations, measures against catching and carrying out fish stocking and other related work should be strictly implemented, which is crucial to the effective population size and genetic variation.

Conclusion

In this study, we analysed the population structure and adaptive characteristics of S. o’connori upstream and downstream of the Yarlung Zangbo Grand Canyon. There was no significant genetic differentiation between the seven populations, but in cluster analysis, the upstream Zangga population and the downstream Milin, Linzhi, and Bomi populations could be separated, indicating that geographical distance and waterfall impacted the genetic differentiation of the populations. The Bomi population was more closely related to the Milin and Linzhi populations, which may be due to the accelerated uplift of the Qinghai Tibet Plateau in the later stage and the impact of the Fourth Ice Age, which caused the Bomi population to separate from the mainstream and become a tributary population. In addition, the enrichment of selected sites and energy metabolism and DNA repair pathways help S. o’connori adapt to the low temperature and strong ultraviolet radiation environment at high altitude. These results provide a basis for the resource protection and adaptability of S. o’connori.

Materials and methods

Sample collection

From May to July 2022, 53 samples of S. o’connori were collected from seven sampling sites in the middle reaches of the Yarlung Zangbo River, including the Zangga Village section (ZG), Dagu Power Hydropower Station reservoir section (DG), Zangmu Power Hydropower Station reservoir section (ZM), Jiacha Hydropower Station reservoir section (JC), Milin section (ML), Linzhi section (LZ) and tributary Palong Zangbo Bomi section (BM). The sample information is shown in Table 5 and Fig. 8. All fins were stored in 95% ethanol and kept in a refrigerator at -20 °C.

Table 5 Information on Schizothorax o’connori samples
Fig. 8
figure 8

Sampling locations of Schizothorax o’connori in the Yarlung Zangbo River. SN, Shannan; SR, Sangri; ZG, Zangga; DG, Dagu; ZM, Zangmu; JC, Jiacha; ML, Milin; LZ, Linzhi; BM, Bomi

Whole-genome resequencing

DNA was isolated from the tissues and evaluated by 1% agarose gel electrophoresis. DNA purity and concentration were detected by a Nano Photometer Spectrophotometer and Qubit 2.0 Fluorometer, respectively. Qualified DNA fragmentation was performed using an ultrasonic processor, and the length of the inserted fragment was approximately 350 bp. Then, terminal repair, base A addition, sequencing adapter addition, purification and polymerase chain reaction (PCR) amplification were performed to complete the preparation of the 350 bp library. Subsequently, the library concentration was diluted to 1 ng/μL by Qubit 2.0, and Agilent 2100 was used to detect the size of the inserted fragments in the library. Then, real-time quantitative polymerase chain reaction (RT‒qPCR) was implemented to ensure the effective quantitative concentrations of the library. When the concentration of q-PCR in the library was greater than 3 nmol/L, it was considered an effective quantitative concentration. Finally, the DNA genome was sequenced by the Illumina NovaSeq 6000 platform according to the PE150 strategy at Wuhan Wanmo Technology Co., Ltd.

Read alignment and SNP calling

Fastp software [87] was used to perform quality filtering on raw data generated by high-throughput sequencing to obtain clean reads that could be used for analysis. Then, using BWA (Version: 0.7.12) [88], clean reads were aligned to the reference genome, and the alignment algorithm was bwa mem. The reference genome and annotation information were provided by Xiao et al., who published the whole genome of S. o’connori [24]. The comparison results were formatted and sorted using SAMtools software (Version: 1.9) [89] and then marked with duplicate reads using MarkDuplicates in Picard software (Version: 2.18.17) [90]. The results were compared and statistically analysed using Qualimap software [91].

Based on the comparison results between the sample and the reference genome, SNP and InDel detection were performed using the Haplotypecall module of GATK software (Version: 3.8.1) [92], generating gvcf files for each sample. Then, population SNP and InDel detection was performed using the GenotypeGVCFs module, and the obtained population SNP and InDel data were strictly filtered and screened. The parameter indicators are as follows [34, 93,94,95]: (1) GATK filtering parameters: QD < 2.0 | MQ < 40.0 | FS > 60.0 | SOR > 3.0 | MQRankSum < -12.5 | ReadPosRankSum < -8.0, sites that met any indicator were excluded; (2) Allelic type: SNP loci are generally of second allelic genotype, so we filtered out loci with two or more completely different genotypes; (3) minor allele frequency (MAF): We removed SNP sites with MAF less than 0.05. The filtered SNP data were annotated using SnpEff (Version 4.3 T) [96] software combined with annotation information from the reference genome.

Genetic diversity analysis

The parameters of genetic diversity were calculated using VCFTOOLS4.0 [97], including nucleotide diversity (π), observed heterozygosity (Ho), expected heterozygosity (HE), and polymorphism information content (PIC). The genetic differentiation index (Fst) and analysis of molecular variance (AMOVA) between populations were calculated using ARLEQUIN3.5.1.3 [98].

Phylogenetic analysis

Using PLINK [99] to filter the SNPs of the entire genome for linkage disequilibrium (LD), the parameter was “- independent airwise 50 10 0.2”, and there was no tight selection chain-linked SNPs. Subsequent phylogenetic tree analysis, principal component analysis, and population structure analysis were conducted based on these SNPs.

ADMIXTURE [100] software was used to perform population structure analysis from K = 1 to K = 10, 10 different seeds were selected for 10 repeated analyses, and then the results were clustered 10 times using pong [101]. The optimal K value was determined based on cross-validation error (CV). The optimal number of clusters was determined according to the valley value of the cross-validation error rate. Population gene exchange and differentiation were analysed using TreeMix V1.12 [102].

A neighbour-joining method (NJ) phylogenetic tree was constructed using MEGA v11 [103, 104]. The distance matrix was calculated using TreeBest software [105], and the reliability of the NJ tree was tested using the bootstrap method (repeated 1000 times) [106]. Based on SNP markers, principal component analysis (PCA) was performed using Plink software to obtain the two most influential feature vectors [99].

Linkage disequilibrium and population historical dynamics analysis

PopLDdecay (Version 3.40) [107] was used for linkage disequilibrium (LD) analysis. The parameters were -MaxDist 500 and -MAF 0.05, and the other parameters were default parameters. PSMC software [108] was utilized to analyse the population dynamic history. The parameters of PSMC were as follows: -N 30–T 5–R 5–P, "4 + 30 * 2 + 4 + 6 + 10", generation G was 2 years, and the base mutation rate μ was 2.5 × 10–8.

Analysis of genome-environment association and selective sweep regions

From Worldclim (https://www.worldclim.org/data/worldclim21.html) download 19 climate data, and then calculate the Spearman correlation coefficient between 19 factors. Once the correlation between two variables was larger than 0.8, one of them was removed [109]. This study used redundancy analysis (RDA) for environmental correlation analysis, which was executed by the rda function in the VEGAN software package (Version: 2.5) [91, 109,110,111].

A combination of lnθπ ratio and FST value screened the selected genomic regions. VCFtools software [112] calculated the lnθπ ratio and FST value using a sliding window, with a window size of 100 kb and a step size of 10 kb. Select the regions that meet both the top 5% and bottom 5% of the lnθπ ratio, as well as the top 5% of the FST value, as the selected regions, and jointly screen for stronger selection signals to obtain the target gene. In addition, the R package rehh was used to detect the gene segments with differentiation between populations using the cross-population extended haplotype homozygosity (XP-EHH) method [113], and the sites with p < 0.05 after false discovery rate (FDR) correction were considered the selected sites.

The selected sites were compared to 6 databases for annotation, including Nonredundant (NR), Nucleotide sequences (NT), Universal Protein (UniProt), Kyoto Encyclopedia of Genes and Genomes (KEGG), Gene Ontology (GO) and Evolutionary genealogy of genes: Nonsupervised Orthologous Groups (EggNOG).

For functional enrichment analysis, all selected SNPs were mapped to terms in the GO and KEGG databases [114,115,116]. Then, with p < 0.05 as the threshold, the significantly enriched GO terms and KEGG pathways were searched in the selected SNPs.