Background

Understanding how levels of genetic variation influence the survival of threatened species is of fundamental interest to evolutionary and conservation biologists because many natural populations are threatened by intense reduction and fragmentation of habitat, leading to isolation, declining populations, and decreasing genetic diversity [1, 2]. Loss of genetic diversity may increase the risk of extinction due to decreased reproductive fitness, decreased adaptive flexibility, and increased disease susceptibility [3]. Clarifying the mechanism that determines genetic variation in small, isolated populations is therefore essential for their conservation [4]. An important assumption in conservation genetics is that small, isolated populations are more sensitive to genetic drift and inbreeding [5, 6]. Genetic drift is the random fluctuation of allele frequencies over time; thus, adaptive alleles may be lost and deleterious alleles could be fixed in the population. The small population size and fixation of deleterious alleles leads to inbreeding depression and reduction of individual fitness, which decreases viability and compromises a population’s evolutionary adaptive potential [6].

However, some functionally important genes that are maintained by balancing selection, such as major histocompatibility complex (MHC) genes, may have a different evolutionary pattern compared with neutral markers. The multi-gene MHC family is found in vertebrates, codes for cell surface glycoproteins, and is important in animal conservation due to its role in resisting pathogens [7]. Compared to nearly neutral markers such as microsatellite loci or mitochondrial DNA, which are informative for phylogenetic and phylogeographic reconstructions [8], MHC variability is believed to determine the capability of individuals to resist continuously evolving pathogens and parasites. Consequently, MHC variability is a reflection of the processes that are related to adaptive evolution within and between populations [9]. Thus, most variation at MHC loci reflects the effects of balancing selection [10], which is the main mechanism for retaining high MHC genetic diversity. Balancing selection includes frequency-dependent selection, overdominance and diversifying selection and promotes long evolutionary persistence of individual alleles and strongly differentiated allelic lineages in mammals [2]. Besides balancing selection, intragenic recombination has also been suggested as one evolutionary mechanism for generation of MHC sequence diversity [11, 12]. Still, several significant questions remain. Current results conflict or are unclear with regard to the relative roles of balancing selection and genetic drift in maintaining MHC polymorphism in small, isolated, or severely bottlenecked populations [13, 14]. Studies of guppies, Mexican wolves and Namibian leopards found that even when genetic diversity at neutral markers was poor, polymorphisms in MHC were still maintained by balancing selection [1317]. In contrast, other research has found low levels of detectable polymorphisms for MHC genes in populations with lower diversity in neutral markers, including studies on fallow deer (Cervus dama) [18], northern elephant seals (Mirounga angustirostris) [19], great crested newt (Triturus cristatus) [20], and the black-footed rock-wallaby (Petrogale lateralis lateralis) [21]. These results suggest that compared to genetic drift balancing selection is relatively weak in small populations, leading to reduced variation at some MHC loci [22]. To elucidate the role of balancing selection and genetic drift in populations, FST outlier analysis [23] is widely used because demographic processes affect neutral loci and lead to population differentiation (measured by FST) [24]. The FST values are computed for all genes to distinguish the genes under selection from those non-selective genes [25].

We studied MHC and neutral genetic variation in the golden snub-nosed monkey (Rhinopithecus roxellana), an endangered primate endemic to China where it inhabits three isolated areas: Sichuan and Gansu provinces (SG); the Qinling Mountains, Shaanxi province (QL) and the Shennongjia Forestry District, Hubei province (SNJ) (Figure 1). Current census data suggest that fewer than 22,000 individuals remain (about 15,000 individuals in SG, 5500 individuals in QL and 1000 individuals in SNJ) [26], and mitochondrial DNA and microsatellite studies indicate that the genetically distinct SNJ population has very low genetic variation [27, 28]. The current effective population size (Ne) in SNJ is approximately 80 individuals [29]. The small size and isolation that typically characterize local populations make the golden snub-nosed monkey particularly susceptible to loss of genetic variation through inbreeding and genetic drift [30, 31]. In this study, we 1) investigated population structure and polymorphism levels of 16 microsatellite loci and 2 MHC II genes, 2) tested for both selection at the MHC loci and patterns of between population differentiations, and 3) evaluated the role of balancing selection and genetic drift in populations.

Figure 1
figure 1

Distribution of isolated snub-nosed monkey populations.

Results

Genetic variation at microsatellites

Genotyping of the 16 microsatellite loci revealed that the highest level of expected heterozygosity (HE) and observed heterozygosity (HO) was in the SG population, while the lowest level was in the SNJ population (Table 1, Additional file 1: Table S1). The Hardy–Weinberg equilibrium test showed no deviation from equilibrium after Bonferroni correction. No linkage disequilibrium occurred for pairs of loci in any population, and Micro-Checker revealed no evidence of stuttering, null alleles, and allele dropout.

Table 1 Population genetic parameters for populations estimated from microsatellite data

MHC variation, expression analyses, and recombination analysis

For each locus, we analysed 894 clones from 64 individuals that represented three monkey populations. We obtained 13 different DQA1 fragments with lengths that ranged from 436 bp to 444 bp including exon 2 and partial intron 2 and intron 3 [GenBank: JQ217094-JQ217106]. After performing BLAST with the HLA-DQA1 and Macaca mulatta DQA1 exon 2, we removed the intron and obtained 9 DQA1 exon 2 sequences with an equal length of 249 bp. These sequences were labelled as Rhro-DQA1*01–09 [GenBank: JQ217107-JQ217115] according to the nomenclature of Klein et al.[32] and were used for analysis. At the DQB1 locus, we identified 16 unique DQB1 exon 2 sequences that we called Rhro-DQB1*01–16 [GenBank: JQ217116- JQ217131].

Fragments of MHC loci were successfully amplified from the cDNA of 2 blood samples that were also used in DNA amplifications. At the DQA1 locus, 189 bp fragments of DQA1 exon 2 were obtained from 2 blood samples, and these fragments were part of the alleles obtained from DNA in the same individual [EMBL: HE616682- HE616683]. At the DQB1 locus longer fragments were obtained, including the entire exon 2, partial exon 1, and exon 3; however, these fragments were only obtained from 1 blood sample [GenBank: JQ217132-JQ217133]. Shorter fragments of DQB1 exon 2 were obtained from 2 blood samples [EMBL: HE616684- HE616686]. No sequences contained stop codons. The sequences from RNA were identical with that from DNA, which supports the hypothesis that these MHC PCR products were transcribed and expressed.

No more than two alleles were present per individual, suggesting that for each MHC locus one gene copy was sequenced with these primer sets. No stop codon or insertion/deletion was detected in DQA1 and DQB1. The measures of MHC diversity in the 3 populations are summarized in Table 2. All three populations shared four of the nine DQA1 alleles and two of the among 16 DQB1 alleles (Additional file 2: Table S2). The MHC sequences were highly divergent: 61 of 249 bp of DQA1 and 70 of 256 bp of DQB1 were variable. The Hardy–Weinberg equilibrium tests showed that the QL population deviated from equilibrium (p = 0.006). Linkage disequilibrium occurred in the SG population (p = 0.01). Population recombination analysis using LDhat revealed that the DQB1 locus had a higher recombination rate (ρ = 7) when all alleles were included between the MHC loci (all DQA alleles: ρ = 2).

Table 2 Summary of MHC variation in R. roxellana populations

Population structure and phylogenetic analysis

Microsatellite-based analyses detected significant genetic structures among the populations (Figure 2). Two genetic clusters were found (SG-QL, SNJ) with a clear maximum ΔKK = 95.500 at K = 2). The pairwise FST was the largest between SNJ and the other 2 populations among all of the FST values (Figure 3), which showed that the differences between SG and QL were minimal compared to those between SG-QL and SNJ. However, there was also a small ΔK peak at K = 5 (ΔK = 23.59), and results showed that in addition to the divergence between SG and QL, there was also divergence within the SG and QL populations. These results agree with earlier studies, where the differences within the SG and QL populations resulted from their different origins and habitat fragmentation [33, 34]. The clear population structure observed in microsatellites was not found at either the DQA1 locus or the DQB1 locus (Additional file 3: Figure S1). Similar to previous studies [7], the relationship among MHC alleles did not agree with the geographical locations of the 3 populations, as observed in the microsatellite results (Figure 4). DQA1/DQB1 alleles were not grouped by species but were mixed with other species (trans-species polymorphism (TSP). No clear clade could be identified from the phylogenetic tree at these two loci. Populations had more private alleles at DQB1 than DQA1 (Table 2). Four among 9 DQA1 exon 2 alleles were commonly shared among the 3 populations, while just 2 (Rhro-DQB1*01, 03) among 16 DQB1 exon 2 alleles were shared among all of the populations (Additional file 2: Table S2). However, frequency differences of these shared alleles were found between all population pairs (except Rhro-DQA1*01, 02). Comparisons of pairwise FST values of MHC and microsatellite sites are shown in Figure 3. The FST values showed lower divergence among populations compared with microsatellite loci (except SG-SNJ and QL-SNJ at the DQB1 locus). Half of the FST values at the MHC loci were higher than 0.05, indicating that differences existed among populations, especially between SG and SNJ. The population difference was more distinct at the DQB1 locus: the differentiations between the SG and SNJ populations and between the QL and SNJ populations were even larger than the differentiation at the microsatellite.

Figure 2
figure 2

Bayesian STRUCTURE clustering based on microsatellite genotypes among 3 snub-nosed monkey populations. (A) ΔK values as a function of K based on 10 runs, indicating the most likely number of 2 genetic clusters. (B) STRUCTURE assignment output at K = 2 and K = 5. The proportions of ancestry assigned to different clusters were plotted by individuals.

Figure 3
figure 3

Differences of pairwise F ST values among populations in MHC genes and microsatellite loci.

Figure 4
figure 4

Phylogenetic relationships among snub-nosed monkey MHC alleles. The phylogenetic tree was reconstructed using the ML inference method. Bootstrap values above 60 are given for each clade. The alleles obtained in this study are shown as red bars. Other allelic sequences, which were downloaded from GenBank are included in the analyses: Homo sapiens (HSU97555, AY375917, GQ422610, AY334565); Macaca fascicularis (AM086058-AM086060); Macaca mulatta (M81297, M81292, AJ308047, AJ308046); Pan troglodytes (M81260, M81262).

Historical selection

Evidence for historical balancing selection was detected. First, in the ABS, the number of nonsynonymous (dN) mutations was greater than the number of synonymous (dS) mutations using the Z-test (DQA1: dN/dS = 1.15, p = 0.405; DQB1: dN/dS = 4.41, p = 0.01) (Table 3). Although p-value was not significant for DQA1 locus in Mega, ABS sites under significant selection were detected in Paml 4 (Additional file 4: Table S3). After comparisons of various codon evolution models using Codeml in Paml 4, models integrating positive selection (M2a, M8, and M3) matched MHC better than the other models based on the Akaike information criterion (AIC) values (Tables 4 and 5). Under model M2a, 9 DQA1 sites and 5 DQB1 sites were exposed to significant selection. Under model M8, 11 DQA1 sites and 9 DQB1 sites were identified. Moreover, most of these sites were ABS sites and others located near ABS sites (Additional file 4: Table S3, Additional file 5: Table S4). The existing TSP at these loci provided additional evidence in support of historical balancing selection (Figure 4).

Table 3 Average non-synonymous substitutions per non-synonymous site ( d N ) and synonymous substitutions per synonymous sites ( d S )
Table 4 Results of maximum-likelihood models for exon 2 of the DQA1 gene
Table 5 Results of maximum-likelihood models for exon 2 of the DQB1 gene

FSToutlier analysis

The FST outlier analysis did not reveal a high or low level of differentiation among populations at the DQA1 and DQB1 loci compared with 16 microsatellite loci (Figure 5A). One microsatellite locus (D14S306) was located outside the 95% confidence interval. For each population, the MHC loci were also located within the 95% confidence interval of the microsatellites. Some satellites existed outside the 95% confidence interval: in the SG population, D14S306 was lower than the neutral level, and D1S1665 was at the edge of the neutral level (Figure 5B); in the QL population, D6S1056 was under the candidate positive selection level (Figure 5C); and in the SNJ population, D1S1665 was above the neutral level while D6S474 was below (Figure 5D).

Figure 5
figure 5

F ST outlier analysis. Circles are observed values from 16 microsatellite loci and MHC genes for (A) all 3 populations, (B) SG population, (C) QL population, and (D) SNJ population.

Discussion

Habitat isolation generates barriers to gene flow among populations that often result in loss of genetic diversity through genetic drift and inbreeding [2]. For vertebrates, genetic variation is of special importance in MHC genes due to the significant role they play in immune functions [35]. MHC diversity is presumed to improve parasite resistance, reproductive success, and population viability [36], and has been studied in species such as western gorilla (Gorilla gorilla) [37], brown bear (Ursus arctos) [38], Ethiopian wolf (Canis simensis) [39], European bison (Bison bonasus) [40], Bengal tiger (Panthera tigris tigris) [41], and Namibian leopard (Panthera pardus pardus) [17]. In this study, a relatively high level of MHC variation was found in the golden snub-nosed monkey, with 9 DQA1 and 16 DQB1 alleles found in 64 individuals. However, we found lower MHC variation in the SNJ population, which also showed lower genetic diversity in microsatellites and mitochondrial genes [27, 34]. Small sample size cannot account for the reduced variability, because sample collected across the entire distribution of the SNJ population showed similar patterns of genetic diversity [29].

For each MHC locus, regardless of how many clones were sequenced from an individual, no more than 2 alleles were observed in an individual, a strong indicator that we amplified single loci in all cases. We assumed that all alleles were from a single functional gene. The assumption that our sequences were from functional genes was supported by three findings: 1) sites inferred to have been exposed to significant selection, most of which were ABS sites, indicating historical selection at functional genes; 2) no reading frame or stop codon disruptions found in any alleles; and 3) the sequences identified from cDNA were parts of or included sequences obtained from DNA.

Historical balancing selection

Golden snub-nosed monkey MHC genes reflect historical balancing selection in that an excess of non-synonymous substitutions was mainly concentrated in the ABS (Table 3, Additional file 4: Table S3, Additional file 5: Table S4). According to neutrality theory [42], the synonymous nucleotide substitution rate is larger than the non-synonymous substitution rate because a change in amino acid sequence has a greater possibility of being deleterious. The elevated rate of non-synonymous substitutions at the ABS provided clear evidence of positive selection [43, 44] shaping genetic variation [45]. The p-value was not significant at DQA1 ABS may due to a weaker recombination within DQA1 as a lower recombination rate has been shown before. Without higher recombination and stronger selection, some DQA1 alleles might be lost when population size decreased. The sharing of MHC alleles among populations also indicates that MHC alleles may have been conserved by selection [19]. Second, random site models analysis proved the existence of historical selection based on the maximum likelihood method, which revealed that, for MHC genes, the models including selection (M2a, M3, and M8) match MHC alleles better than models without selection (Table 4,and 5). Under the M2a and M8 models, some sites of the 2 MHC loci were under significant selection pressure. Furthermore, trans-species evolution of the MHC alleles revealed historical balancing selection. Under balancing selection, some MHC alleles or allelic lineages are reported in other species, which indicates that they are ancestral alleles [46].

Genes vary in terms of the level of selection, and both DQA1 and DQB1 revealed different patterns of selection. Each population had unique DQB1 alleles, while not every population had unique DQA1 alleles (Table 2). Population divergence, measured as pairwise F ST , was larger in DQB1 than in DQA1 except for that between SG and QL (Figure 3). Similar results were reported in water vole, where balancing selection pressure was different at MHC genes in continuous populations [47]. MHC genes are assumed to be closely linked [48]. In our study, however, linkage disequilibrium between DQA1 and DQB1 was only observed in the SG population. This weak linkage disequilibrium and the different selection pressures on these loci could be a result of recombination, which is common at the MHC genes [49, 50]. The higher recombination rate that was found in DQB1 genes may explain their larger allelic richness compared to DQA1 genes. Recombinants maintained by selection may counteract the linkage of closely linked genes [51] and play an important adaptive role in DQB1 evolution. In the present study, historical selection was found, but this does not conclusively indicate that balancing selection is acting on current populations. First, an excess of non-synonymous mutations requires a long time to accumulate [47]. Once present, this pattern would take a long time to vanish after the disappearance of selection [10]. Hence, we investigated whether selection continues to play a major role at present.

Patterns of selection and drift

Although selection historically maintained MHC diversity, recent population isolation and fragmentation has increased the role of genetic drift in shaping patterns of MHC variation in snub-nosed monkeys. First, compared with neutral forces, balancing selection is supposed to diminish population differentiation as measured by conventional pairwise FST[22, 52]. Thus, the population structure of genes under balancing selection should not be pronounced [53]. However, in the present study, half of the pairwise FST values were greater than 0.05, and two FST values at DQB were even greater than those at microsatellites (Figure 3). Second, FST outlier analysis showed that the structure in the MHC loci was within the neutrality level for all populations and for each population. Considering all populations, one microsatellite (D14S306) showed a FST value that was lower than the neutral level, indicating its linkage with other genes under selection [22]. Lastly, a positive correlation was found between allelic richness in MHC and microsatellites. The QL population had the highest allelic richness in microsatellites and MHC, while SNJ had the lowest. This positive correlation indicates that genetic drift plays a significant role in maintaining MHC diversity for snub-nosed monkeys [25]. Maintenance of MHC variation through balancing selection may be hampered in small, isolated populations because of their lower effective recombination rate [54]. In all, our results indicate that even though selection acts on MHC, it is overwhelmed by genetic drift in small, isolated populations.

The positive correlation in the allelic richness of MHC and microsatellites, together with other evidence, indicates that genetic drift has a great influence on the maintenance of MHC variations in small, isolated populations of snub-nosed monkeys [7, 25, 55]. No evidence showed that MHC polymorphism had increased in populations that contained low neutral variation [27, 42]. Under neutral evolution theory [56], alleles are expected to be neutral when s < 1/2 Ne (s = selection coefficient, Ne = effective population size). Therefore, the smaller Ne becomes, the greater the likelihood of genetic drift [57]. The SNJ population is subject to more genetic drift than the other two populations as found in a previous study [29]. Other animals whose patterns of MHC polymorphism have been contributed to drift over selection include the great crested newt (Triturus cristatus) [19], black-footed rock-wallaby (Petrogale lateralis lateralis) [20], tuatara (Sphenodon spp.) [7], and the Egyptian vulture (Neophron percnopterus) [58]. These results suggest that selection on MHC is not strong enough to counteract drift that results from population fragmentation, isolation and bottleneck.

Conclusion

Variation at MHC loci is widely accepted as being maintained by balancing selection (reviewed in 2), even with a low level of neutral variability in some species [29, 59]. This shows the importance of balancing selection for maintaining variation in the field and exposes the problem of using neutral genes as substitutes for variation in fitness-related genes [32]. However, in small, isolated populations or bottlenecked populations, balancing selection is overwhelmed by drift [21]. In this study, we found the same genetic variation pattern both at neutral and MHC markers, suggesting that genetic drift was stronger than selection, thus leading to a reduction in MHC diversity in the most isolated populations. Such findings may contribute to the conservation of endangered species such as snub-nosed monkeys both in captive breeding and translocation programs. Though the relationships between MHC with mate choice and pregnancy outcomes still remain controversial [60], it could be helpful to examine MHC variation in captive individuals that are involved in mating programs. In the field, translocation of individuals from demographically and genetically healthy populations to populations that suffer from reduced genetic diversity can improve the chances of genetic recovery [61, 62]. Recently, researchers have found that inbred populations could thrive with the import of migrants as part of a conservation management program [6365]. Further, the introduction of individuals from western populations of golden snub-nosed monkeys could restore genetic diversity to the relatively homogenous SNJ groups, reducing the likelihood of inbreeding depression.

Methods

Sample collection and DNA extraction

We collected 64 R. roxellana samples (muscle, skin, and blood) from the current population (SG population = 25, SNJ population = 17, and QL population = 22) (Figure 1 and Table 1). All collections complied with the relevant animal welfare institutions and laws of China. Also, the Institute of Zoology (Chinese Academy of Sciences) provided ethical approval for this study. Muscle and skin samples were collected from carcasses that were provided by local museums and nature reserves. Skin samples were stored dry, muscle samples were stored in 95% ethanol, and blood samples were collected while trapping individuals for physical examination and were stored in a refrigerator at −80°C. During DNA extraction, benches and plasticware were washed with 10% bleach and sterile water and were exposed to UV light for 30 min prior to treatment to prevent contamination during DNA extraction. During the subsequent polymerase chain reaction (PCR), 8 extraction controls were used without any positive amplification.

Microsatellite analyses

Samples were genotyped at 16 microsatellite loci [33]. Thirty-five cycles of PCR amplification were executed at the same time for up to 3 loci with reliable genotypes, with combinations selected by fluorescent dye (HEX, ISMRA, or FAM), Tm, and fragment size using the QIAGEN Mutliplex PCR kit following the manufacturer’s protocols at optimized annealing temperatures (55°C). An ABI 377 prism automated sequencer was used to resolve products, which were then analysed using GeneScan v3.1.2 and Genotyper 2.5 (Applied Biosystems).

MHC amplification, cloning, sequencing, and expression analyses

The PCR was carried out in a 50 μL solution including 10 mM Tris–HCl (pH 8.4), 50 mM KCl, 2.5 mM MgCl2, 0.4 μM each primer, 0.2 mM each dNTP, 1.0 unit Hotstart-Taq DNA polymerase (Takara), and 10–100 ng DNA template. The amplification profile consisted of 5 min at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at 56°C (DQA1) or 58°C (DQB1), and 30 s at 72°C, with a final extension of 10 min at 72°C. The following primers were used for amplifying exon 2: 5'DQA1-AAGCCCA TAATATT TGAAAGTCAGT and 3'DQA1-TATGTGATTTTAGAGATGGGAGATG, or 5'DQB1-TGTAAAC GACGGCCAGTTCCCCGCAGAGGATTTCGTG and 3'DQB1-TGCTCTAGAGGGCGACGACGACGCCTCACCTC [66]. The DQA1 primers were designed based on BLAST sequences from relative species. According to the manufacturer’s protocols, a Wizard PCR Preps DNA Purification Kit (Promega) was used to purify the PCR products. Purified PCR products were cloned using the pMD-18T vector (Takara) following manufacturer’s instructions. Ten to twenty clones containing inserts from each individual were sequenced on an ABI 377 or ABI-PRISMTM 3100 Genetic Analyzer (Applied Biosystems Inc.) with the Prism BigDyeTM Terminator Ready Reaction Kit (Applied Biosystems Inc.).

Expression analyses were conducted to validate the expression of the obtained MHC II-DQA and DQB alleles. The RNA was isolated from 2.5 mL whole blood using Trizol (Invitrogen) following the manufacturer’s instructions. To ensure that the genomic DNA was removed from the isolated RNA, a second DNA digestion was performed using the DNase I RNase-free Set (Promega). The cDNA synthesis was obtained using 200 U of M-MuLV Reverse Transcriptase (Promega) in a 25 μL reaction tube containing 1μL Oligo(dT)12–18 primer (0.5 μg/μL; Invitrogen), 2 μg total RNA as a template, and 25 U ribonuclease inhibitor, RNase-free water, 5 μL dNTP mix (10 mM), and 5 μL 5× reaction buffer. The reactions were incubated at 42°C for 60 min. The cDNA was acquired from 2 samples and was amplified by PCR using the primer sets (Additional file 6: Table S5). The primer sets were exon-spanning to detect the amplification of genomic DNA contaminants based on the BLAST results of human and Macaca mulatta exons. All amplified cDNA products were analysed by cloning and sequencing as described above.

Data analysis

Microsatellite loci diversity analyses

Genetic diversity was measured as observed (HO) and expected heterozygosities (HE) [67]. The above analyses were computed using Arlequin v3 [68]. Inbreeding coefficients (FIS) and allelic richness (AR) analyses were performed in Fstat version 2.9.3 [69]. The AR calculations were set to the smallest population size correction. Pairwise FST among populations and Hardy–Weinberg equilibrium were analysed across all loci for each population in an exact probability test in Genepop 4.0 [70]. Using the Bonferroni correction, significance values were adjusted for multiple comparisons. Linkage disequilibrium (LD) was calculated across all loci with the Weir correlation coefficient [71] in Genepop 4.0 [70]. To determine which LD values were significant, a permutation test was used. The presence of stuttering, null alleles, and allele dropout were examined using Micro-Checker [72].

MHC diversity analyses

Sequences were aligned and translated into the amino acid sequences in Mega 4 [73]. The gene identity was verified through homology with publicized MHC alleles of other species using BLASTN (http://blast.ncbi.nlm.nih.gov/Blast.cgi) from NCBI. Because some obtained sequences are false alleles potentially corresponded to PCR amplification artifacts [47], a new sequence was assumed to be an allele when it was identified from 3 separate amplications from the same sample or from at least 3 different samples [74]. We used Mega 4 to detect the variable and parsimony-informative sites (sites with at least 2 different nucleotides or amino acids), to compute the mean number of nucleotide differences, and to derive the overall mean genetic distances of nucleotide sequences. Standard estimate errors were detected using 1,000 bootstrap replicates. Hardy–Weinberg equilibrium tests, pairwise FST values, and LD were calculated in Genepop. The AR and FIS of DQA1 and DQB1 were compared among populations using the comparison among groups of samples option in Fstat version 2.9.3 [69]. The AR estimates were adjusted to the smallest sample size.

Phylogenetic analysis and recombination analysis

To detect the genetic structures of microsatellite loci, Bayesian clustering was used in Structure 2.1 [75]. Because of the subtle population structure, correlated allele frequencies between populations and the admixture model were chosen [76]. The clusters (K) tested range was set from 1 to 10, and 10 independent runs were executed for each analysis. The Markov Chain Monte Carlo (MCMC) iterations lengths and burn-ins were set at 5,000,000 and 200,000, respectively. For most situations, ΔK had a mode at true K; thus, true K was selected using the ΔK statistic, which was calculated according to the rate of the log probability change of the data between successive K values [77].

The relationship of the MHC allele phylogenesis was constructed using a maximum likelihood method in PhyML 3.0 [78]. Before phylogenetic analysis, the most appropriate evolution models of sequence were estimated based on Akaike Information Criterion (AIC) in Modeltest 3.7 [79]. The K80 + G model was suggested to be the optimum model for MHC sequences with the gamma shape parameter α = 0.6532. The reliability of the obtained tree topology structure was carried out with 1,000 bootstrap replications. In addition, intraspecific phylogenetic structures were inferred using median-joining networks in Network 4.600 [80]. To estimate the rate of population recombination ρ (ρ = 4Ner), the composite-likelihood method [81] in LDhat [82] was used. The ρ is calculated using the crossing over for an effective population size (Ne) and rate per generation (r), and is estimated without prior information [82]. Even for the sequences that evolved under balancing selection in the existence of recombination events, the LDhat still works efficiently [83].

Detecting historical selection

Two methods were used to detect historical selection. First, we calculated the rates of non-synonymous and synonymous substitutions at all amino acid sites, ABS sites, and non-ABS in Mega 4 by the Nei-Gojobori method with the Jukes-Cantor correction [84] and 1,000 bootstrap replicates to obtain standard errors. The putative ABS and non-ABS locations were derived according to the structure of human MHC II [85]. Compare with the method that used HLA-DR1 [86] structure to derive ABS sites, Reche and Reinherz’s research provided a better way to avoid the emergence of uncertain gaps while blasting DQ1 genes with DR1 genes, and was also adopted by other studies [87, 88]. Historical selection evidence was obtained with Codeml in the Paml 4 package [89]. This procedure examined heterogeneity in ω (ω = dN/dS) [90] among codons based on the maximum likelihood method, with positive selection indicated by ω = dN/dS > 1. Six models (M0, M2a, M3, M7, M8) allowing for different selection intensity among sites were tested [91, 92]. The sites revealed to be under selection were compared with ABS sites.

Detecting recent selection

The FST outlier analysis [22] was used to examine whether MHC genes had significantly different population differentiation measures compared with microsatellite loci. Loci outside the confidence intervals of neutral FST values were likely under selection. The FST was estimated for DQA1, DQB1, and 16 microsatellite loci using Lositan [22, 93]. Simulations involving 100 demes were sampled, and HE and FST were printed for each of the 100,000 simulations. The infinite allele model was used in simulations that were performed using Fdist2. The aforementioned FST outlier analysis was also carried out on each population respectively. For each population, simulation parameters were similar using the same parameter set.