Background

The Major Histocompatibility Complex (MHC) is a family of genes that play a major role in activating adaptive immune responses [1]. Some of these gene families code for transmembrane proteins that protect individuals from viral, bacterial and parasitic infections by presenting pathogen-derived peptides to T lymphocytes, which subsequently triggers an immune response. The MHC molecular region, called HLA in humans and Patr in chimpanzees, is very similar in these two species as orthologous genes involved in peptide presentation are physically arranged in a comparable way [2,3,4,5,6,7] (Fig. 1). These genes are organized into two classes that differ from each other based on major structural and functional differences between their corresponding proteins. The molecules expressed (on almost all nucleated cells) by the classical class I genes (named A, B and C) consist of one α chain, non-covalently bound to a small β2-microglobulin chain which is not encoded in the MHC region. The α1 and α2 domains of this heavy chain form the peptide-binding region (PBR) which presents short peptides (mostly nonamers) of intracellular origin at the cell surface to CD8+ cytotoxic T lymphocytes. In all classical MHC class I genes, the 2nd and 3rd exons encoding these two domains are highly polymorphic. Chimpanzees may also possess an additional class I A-like locus named Patr-AL which is in strong linkage disequilibrium with Patr-A [8, 9]. However this gene is not fixed but only present on a portion of the haplotypes. The MHC molecules encoded by the class II genes (named DP, DQ and DR) display a more specific tissue distribution limited to professional antigen presenting cells implicated in the immune response, i.e. mostly B lymphocytes, dendritic cells and macrophages. Contrary to class I, class II proteins are heterodimers composed of one α chain coded by a “A” gene (named DPA, DQA or DRA) and one β chain coded by a “B” gene (named DPB, DQB or DRB, respectively). The α1 and β1 domains of the α and β chains form the PBR, which in this case presents peptides (of about 12–15 amino acids) from mostly extracellular origin at the cell surface to CD4+ T-helper lymphocytes. The 2nd exon of most MHC class II “B” genes (which encodes the β1 domain) is highly variable, whereas that of “A” genes (which encodes the α1 domain) is much less polymorphic, except at the DQ loci. Most class II genes also exhibit one or more functional and/or non-functional (i.e. pseudogenic) copies (e.g. DRB1, DRB2, DRB3, etc...) resulting from past duplications [5, 10,11,12,13,14,15,16], but only the four most polymorphic ones DPB1, DQB1, DQA1 and DRB1 are extensively studied.

Fig. 1
figure 1

Map of the human and chimpanzee MHC region showing average physical distances between the 7 loci under study in both species. The distances between loci (in Kb = kilobases) slightly vary between the two species but they have the same order of magnitude. ~ 80 Kb stands for “physical distance between DQB1 and DRB1 is about 80 Kb”

The HLA region is amongst the most variable of the whole genome, with almost 26,000 HLA (class I and class II) alleles identified so far (November 2019, [17]). Its huge level of diversity and/or allelic variation observed within human populations is believed to be maintained by different kinds of balancing selection, most often in the form of heterozygote advantage towards a large variety of pathogens following a divergent allele advantage (DAA) model, although negative frequency-dependent (also named rare-allele advantage) and fluctuating selection in time and space also explain its remarkable variation [18,19,20,21,22]. These mechanisms maintain even HLA allele frequencies in most populations, with recurrent – although not systematic – deviations from neutral expectations towards a significant excess of heterozygotes [21, 23]. However, specific HLA alleles may also act as protective factors to highly prevalent diseases and be selected positively, one of the best examples being the putative increase of B*53 (B*53:01:01) and B*78 (B*78:01) frequencies in sub-Saharan African regions where Plasmodium falciparum malaria is endemic [24,25,26]. Recently, MHC alleles encoding for allotypes with functional similarities to those of HLA-B*53 and HLA-B*78 have also been suggested to play a protective role explaining the likely absence of malaria parasites in bonobos [27]. In addition, demographic processes such as population bottlenecks, genetic drift, demographic expansions or migrations shape the HLA molecular profiles by increasing or decreasing their diversity and create population structure most often highly correlated to geography [21, 28,29,30].

Whether and how MHC genetic variation persists in populations having undergone a pronounced reduction in size, either due to a founder effect or to an epidemic, is an important issue in evolutionary genetics and conservation biology [30,31,32,33,34]. Indeed, a loss of genetic variation, particularly concerning immune-related loci, may have dramatic effects on populations’ survival [33], even though a direct correlation between a lower MHC diversity and a greater susceptibility to diseases has not been demonstrated so far at a population level [35, 36]. In this context, theoretical and empirical studies investigating the relative effects of genetic drift and natural selection on MHC variability during population bottlenecks in different species have reported contrasting results, indicating either that balancing selection processes were efficient enough to maintain moderate to high MHC diversity [31, 37,38,39,40] or that demographic factors exerted stronger influence than selection on diversity [41, 42]. Additionally, the impact of selection may depend both on the timescales, e.g. selection would be able to restore diversity to pre-bottleneck levels after 40 generations [31], and on the specific MHC gene studied [38, 39, 41].

One useful approach to unravel the multiple mechanisms governing the evolution of the MHC region is to compare the diversity of homologous genes among closely related species that underwent distinct demographic histories. This is the case for humans and chimpanzees, which share a common ancestor dating back to ~ 6–8 million years (Myr) ago [43, 44]. According to both archaeological and genetic data, anatomically modern humans (Homo sapiens) first appeared and expanded demographically in Africa between 300,000 and 200,000 years ago [45, 46]. They later dispersed, likely in small groups, across all continents where they eventually underwent secondary expansions, the most extensive ones (in Prehistoric times) occurring in the Neolithic [47, 48]. However, many human populations (most Amerindian, Oceanian and present-day hunter-gatherer and nomadic populations from different continents) did not undergo demographic expansions [49] and still live today in isolated areas where they experience little gene flow and rapid genetic drift [50]. Due to the paucity of fossil records [51], the demographic history of chimpanzee populations relies almost exclusively on molecular analyses. The latter suggest the emergence of both common chimpanzees (Pan troglodytes, P.t. hereafter) and bonobos (Pan paniscus) in Central Africa from a common ancestor ~ 1–2 Myr ago [43, 44]; but while bonobos probably remained confined within the small geographic region where they inhabit today (a narrow territory between the Congo and Kasai Rivers), common chimpanzees expanded across a wider area of equatorial Africa where they are represented today by distinct sub-species (P.t.verus in Western Africa, P.t.ellioti in Nigeria and Cameroon, P.t.troglodytes in Central Africa, and P.t.schweinfurthii in Eastern Africa), albeit mainly within a limited rainforest habitat [52,53,54].

MHC molecular data analyses indicated that both common chimpanzees and bonobos experienced a selective sweep owing to the action of a hypothesised retroviral infection that severely shrunk their population sizes (bottleneck events) [55, 56]. The first evidence comes from the observation of a reduced repertoire of allele families at the Patr-A locus compared to the HLA-A locus in humans [57], suggesting a strong selective sweep – i.e. either purifying or positive directional selection - within the chimpanzees’ MHC class I region. Indeed, whereas HLA-A alleles belong to six different allele families (A2, A10 and A19 within the A2 lineage, and A1/A3/A11/A30, A9 and A80 within the A3 lineage), all Patr-A alleles known so far are associated to the single A1/A3/A11/A30 family [57,58,59,60,61,62] and a similar observation has been reported for the Papa-A alleles [63] (Papa is the name of MHC genes in bonobo). Next, Patr- and Papa-A, −B, −C intron 2 analyses substantiated the reduced diversity observed in the Western chimpanzee (P.t.verus) and bonobo MHC class I regions as compared to HLA-A, −B, −C in humans [55, 63, 64]. In addition, microsatellite analyses in Western chimpanzees and humans revealed a reduced diversity in the Patr region in comparison to microsatellites located elsewhere in the genome [56]. Finally, chimpanzees were shown to exhibit a 95 kb deletion in the MIC region located next to locus B where the single MIC gene, which is fixed on all haplotypes, likely results from the fusion of two ancestral MICA and MICB genes still present in humans [65]. The hypothesis of a selective sweep proposed for chimpanzees finds support in the low genomic diversity found in all common chimpanzee sub-species and in bonobos, which was ascribed to a bottleneck in the ancestors of both species [44]. In addition, these genome-wide analyses also highlighted a second bottleneck occurring later (~ 500,000 years ago) in Western and Nigeria-Cameroon chimpanzees only (although not quite as severe for the latest), which would partially explain why P.t.verus generally displays lower molecular variation in nuclear genes compared to other chimpanzee (sub-)species [44, 66,67,68,69,70,71].

In this study, our objective is to assess whether the genetic diversity at different Patr genes, estimated by means of three different indexes, allelic richness, expected heterozygosity and nucleotide diversity, is significantly reduced in present-day Western chimpanzee as a possible response to their past bottlenecks compared to that of their HLA orthologs in human populations. The detection of a substantially reduced level of Patr diversity would be a possible indicator of depleted immunity and an additional reason to consider P.t.verus as a critically endangered subspecies [72]. Actually, we anticipate chimpanzees’ MHC diversity to be (not necessarily similar but) closer to that of small isolated, as opposed to large outbred human populations (independently of their geographical location) if demographic contractions played a major role on the MHC evolution of both species. In addition, we expect the patterns of genetic variation and linkage disequilibrium to be similar across the HLA and Patr regions if their orthologous loci evolved through analogous molecular mechanisms and were targeted by similar selective pressures in the two species. To address these issues, we analysed all the data currently available for 7 Patr genes (A, B, C, DRB1, DQA1, DQB1 and DPB1) in four P.t.verus cohorts, and we compared them to large sets of data for HLA genes (A, B, C, DRB1, DQA1, DQB1 and DPB1) data previously studied in human populations from different continents, that we also extensively reanalysed. We found marked similarities in Patr and HLA genetic diversity and linkage disequilibrium patterns, indicating highly conserved mechanisms of MHC evolution in chimpanzees and humans. We also showed that Western chimpanzees globally exhibit similar diversity levels and equivalent amounts of linkage disequilibrium to those estimated in small isolated human populations, which suggests that their past bottleneck exerted a substantial effect on the molecular diversity of Patr genes. However, as there was no difference in the MHC diversity of chimpanzees compared to human populations that likely underwent more recent, rapid genetic drift, we hypothesize that several Patr genes rapidly recovered molecular variation after their selective sweep.

Results

Hardy-Weinberg equilibrium and selective neutrality

The results of Hardy-Weinberg equilibrium (HWE) and Ewens-Watterson-Slatkin (EWS) tests are provided in Table 1 (for the pooled chimpanzee cohort and the multiple human populations) and Additional Tables S1 (for the individual chimpanzee cohorts) and S2 (for the individual human populations).

Table 1 Results of Hardy-Weinberg equilibrium (HWE) and Ewens-Watterson-Slatkin (EWS) tests at seven MHC loci in chimpanzees (pooled cohort) and humans (multiple populations)

No deviation from HWE was observed at any Patr locus for any of the four individual cohorts and the pooled cohort of chimpanzees. The computed allele frequencies (see below) could thus accurately be used as population frequencies to compare cohorts among them and with human populations as well as to estimate other parameters requiring HWE (e.g. heterozygosity). Additionally, we found no significant deviations (after correction for multiple testing) of allele frequency distributions from neutral expectations based on the EWS test.

All human populations were also found to be in HWE both before (except the Mixe (Mexico/Oaxaca) at DRB1) and after correction for multiple testing. Contrary to chimpanzees, however, a few significant rejections of selective neutrality were still found in human populations after correction for multiple testing, i.e. towards an excess of heterozygotes at loci A (3.7%), DRB1 (7.9%) and DQB1 (2.5%) and towards an excess of homozygotes at locus DPB1 (2%), but none at loci DQA1, B and C.

To control for the large differences in sample sizes between chimpanzees (average N = 45.57 ± 7.76 on the 7 loci in the pooled cohort) and humans (average N = 109.2 ± 17.31 on the 7 loci and the multiple populations), we also tested HWE and selective neutrality on 1000 simulated sub-samples drawn randomly from each human population, each simulated sub-sample being of same size as the pooled cohort of chimpanzees (see Methods). As a result, we observed various proportions of HWE deviations in the simulated sub-samples depending on the locus (average proportion ± 2xStandard Error, DPB1: 8.05% ± 8.20%, DQB1: 13.38 ± 9.83%, DQA1: 6.89 ± 9.40%, DRB1: 10.37 ± 8.99%, B: 3.36 ± 5.05%, C: 3.20 ± 4.93%, A: 5.77 ± 6.6%, Additional Table S2). As almost all human populations of the original dataset were in HWE, this overall result allowed us to conclude that a reduction in sample size sometimes leads to type I errors, i.e. false positives, at loci DQB1 and DRB1 (the only proportions significantly different from 0). However, for the Mixe from Mexico/Oaxaca, which was the only population for which HWE was rejected before correction for multiple testing (at locus DRB1) in the original dataset, HWE was rejected in all (i.e. the 1000) simulated sub-samples, a result that never occurred otherwise (Additional Table S2). This indicates that the power of the test strongly resists a reduction in sample size, and that the observation of no HWE rejection in the chimpanzee samples truly reflects HWE in the corresponding cohorts.

Regarding selective neutrality, our simulations failed to reject the null hypothesis in various proportions of simulated sub-samples drawn from populations for which neutrality was initially rejected (10% at locus DPB1, 2.1% at locus DQB1, 28.3% at locus DRB1 and 18.4% at locus A, Additional Table S2). In this case, the absence of significant deviations from neutrality observed in chimpanzees could thus correspond to type II errors, i.e. false negatives, due to a lack of power of the neutrality test when applied to small sample sizes, although this occurred in a minority of cases according to our simulations (less than 30%).

Genetic diversity

Allele frequencies estimated in the pooled cohort of chimpanzees are given in Table 2 and Additional Table S1 (for the individual chimpanzee cohorts). Allelic distributions found at the three class I loci B, C and A and at DRB1 are much more diverse than those observed at DQB1 and DQA1 and, to a lesser extent, DPB1. Moreover, at loci DQB1 and DQA1, three alleles account for more than 84.5% of frequencies. A greater number of low frequency alleles are observed for loci B, C and A than for class II loci (in light grey in Table 2, see also SupplementaryText for a comparison between chimpanzee cohorts and human populations).

Table 2 Allele frequencies at each Patr locus in the pooled cohort of chimpanzeesa

The three genetic diversity indexes estimated at the seven Patr genes in the four cohorts and the pooled cohort of chimpanzees are given in Table 3 and plotted in Fig. 2, and the corresponding values are provided in Additional Table S3.

Table 3 Genetic diversity at 7 MHC loci in chimpanzees (average on all chimpanzee cohorts and in the pooled cohort) and human populations (averaged on multiple populations)
Fig. 2
figure 2

Genetic diversity indexes estimated in chimpanzee cohorts and human populations. Left panels: allelic richness (top), heterozygosity (middle) and nucleotide diversity (bottom) at the seven studied MHC loci in the pooled cohort of chimpanzees (in red) and averaged on multiple human populations (in blue). The pooled cohort includes all cohorts except Texascb. Middle panels: allelic richness (top), heterozygosity (middle) and nucleotide diversity (bottom) at the seven studied MHC loci in each cohort of chimpanzees (in red) and for the human populations (in blue) represented as violin plots. The values calculated for each chimpanzee cohort are indicated by filled and unfilled shapes for cohorts of wild-born and captive-born chimpanzees, respectively. The values calculated for the human populations (average number of k = 70 (s.d 15.9) samples of average size N = 109.2 (s.d 17.31)) are shown as violin plots. The width of the violin varies so as to represent the probability density of the data, the thick black bar in the centre represents the interquartile range, the thin black line extended from it represents the 95% confidence intervals, and the blue dot is the median. Right panel: allelic richness (top), heterozygosity (middle) and nucleotide diversity (bottom) at the seven studied MHC loci in each cohort of chimpanzees (in red) and for the human populations (in two shades of blue) represented as violin plots. The values calculated for each chimpanzee cohort are indicated by filled and unfilled shapes for cohorts of wild-born and captive-born chimpanzees, respectively. The values calculated for the human population are plotted as violin plots, in light blue for small sized and isolated populations that likely experienced rapid genetic drift (RGD) and in dark blue for large outbred populations with slow genetic drift (SGD).

In agreement with the observed allele frequency distributions, both allelic richness and heterozygosity show greater values at the three class I loci A, B, C and at DRB1 than at DQA1, DQB1 and DPB1 (to a lesser extent for the latter). Based on the loci for which data were available in (at least one) captive and wild cohorts (DQB1, DRB1, B, C, A), we also observe significantly higher values of these indexes in the captive-born Texascb and Yerkescb than in the wild-born BPRCwb and Kumawb cohorts (Wilcoxon tests, p = 0.0036 and p = 0.0034, for allelic richness and heterozygosity, respectively). By contrast, nucleotide diversity is greater at DRB1, DQA1, DQB1 and B (to a lesser extent in the two latter) than at A, C and DPB1, and no significant differences are observed between the cohorts (Wilcoxon test, p = 0.769).

Like in chimpanzees, both the allelic richness and the heterozygosity estimated in human populations are, on average, greater at the three class I loci A, B, C and at DRB1 than at DQA1, DQB1 and DPB1 and the nucleotide diversity is greater at loci DRB1, DQA1, DQB1 and B than at A, C and DPB1 (Table 3 and Fig. 2). The overall patterns of genetic diversity are therefore similar in the two species. This is also supported by comparing the ordering of the seven MHC loci based on decreasing values of the three diversity indexes (Table 4): identical orders are found for several loci, and small differences are most often observed otherwise. These results suggest that the mechanisms generating diversity at the MHC genes are similar, and thus highly conserved, in the human and chimpanzee lineages.

Table 4 Ordering of the MHC loci based on decreasing values of three genetic diversity indexes in chimpanzees (pooled cohort) and humans (average on multiple populations)

Looking in more detail at the results obtained for individual MHC genes, some significant differences are nevertheless observed between the two species. Compared to humans, in chimpanzees we find a lower heterozygosity, allelic richness and nucleotide diversity at DQB1 (Wilcoxon test, p = 0.016, 0.017 and 0.021, respectively), as well as a lower nucleotide diversity at C and A (Wilcoxon test, p = 0.011 and 0.019, respectively) and a higher nucleotide diversity at B (Wilcoxon test, p = 0.009) (Table 3 and Fig. 2). We obtained similar results by redoing these comparisons without considering the Texascb cohort, which includes individuals of uncertain sub-species (Wilcoxon test: p = 0.019, 0.03 and 0.041 for heterozygosity, allelic richness and nucleotide diversity at DQB1; and p = 0.025, 0.013 and 0.047 for nucleotide diversity at B, C, and A, respectively, see also Additional Figure S1). However, according to both sets of comparisons (i.e. with and without the Texascb cohort), none of these differences remained significant after correction for multiple testing on the number of loci. This confirmed our previous conclusion that chimpanzees and humans display similar patterns of genetic diversity across the whole MHC region (Fig. 2, left and central panes).

Genetic diversity in chimpanzees compared to small and large human populations

Following the idea, based on demographic knowledge, that chimpanzees would be genetically more similar to human populations displaying limited population sizes, we also compared the three diversity indexes between the chimpanzees and the human populations classified either as RGD (small isolated populations that likely underwent Rapid Genetic Drift) or as SGD (large outbred populations those that likely underwent Slow Genetic Drift), respectively (see Methods).

Interestingly, in the chimpanzee cohorts - and particularly so in the wild-born BPRCwb and Kumawb - both the allelic richness and heterozygosity (at all loci except A) are close to the lowest values found for these indexes in human populations, which correspond to those observed in RGD populations (Fig. 2, right graphs). Actually, at these loci, chimpanzees exhibit no significant differences compared to RGD populations, whereas all differences (except heterozygosity at DPB1) are significant compared to SGD populations. In addition, chimpanzees exhibit significant nucleotide diversity differences compared to SGD populations at three loci, DPB1, DQB1 and A (Additional Table S4). After correction for the number of loci tested, the three diversity indexes appear to be both similar between chimpanzees and RGD populations at all loci (except one borderline case, nucleotide diversity at locus B) and different between chimpanzees and SGD populations (at least two loci remain highly significant after correction). This strongly suggests that demographic contractions globally exerted a similar effect – i.e. a decrease in the level of diversity - on Patr and HLA genes.

Again to control for the discrepancy in sample sizes between chimpanzees and humans, we re-estimated allelic richness, heterozygosity and nucleotide diversity on 1000 simulated sub-samples randomly drawn for each human population. For the three diversity indexes, the values (in all cases at a precision of one decimal, but most often, even at two) observed for the original human population samples were always found to fall within the 95% confidence interval of their simulated sub-samples (Additional Table S2). In addition, the relative position of each genetic diversity index observed in the pooled cohort of chimpanzees - i.e. either within or outside the 95% confidence interval - was identical when compared both to the confidence interval of the original human population samples and to that of the 1000 simulated sub-samples (Additional Figure S2). This substantiated our previous conclusion that chimpanzees and human RGD populations exhibit similar MHC diversity patterns.

Linkage disequilibrium

In chimpanzees, global linkage disequilibrium (GLD) appears to be significant between the three class II loci DQA1, DQB1 and DRB1 (i.e. pairs DQA1 ~ DRB1, DQB1 ~ DRB1 and DQB1 ~ DQA1) as well as between the two class I loci B and C (pair B ~ C) (Table 5, see also SupplementaryText), as indicated by the results obtained for the BPRCwb cohort, i.e. the cohort including the greatest number of animals and the only one for which all loci were tested (Additional Table S5). These pairs of loci (actually those that are most close to each other on the chromosome, see Fig. 1) also display the highest proportions of individual haplotypes in linkage disequilibrium (Additional Tables S6 and S7), which strongly supports the observed GLD pattern.

Table 5 Results of Global Linkage Disequilibrium (GLD) significance test (PRS resampling procedure) between different pairs of MHC loci in chimpanzees (BPRC cohort) and humans (multiple populations, further subdivided into RGD and SGD populations)

These results are again similar in humans. Indeed, significant GLD is observed for the same pairs of loci DQA1 ~ DRB1, DQB1 ~ DRB1, DQB1 ~ DQA1 and B ~ C in the majority (more than 70% and up to 98%) of human populations (Table 5), and the highest proportions of individual haplotypes in significant linkage disequilibrium are also observed at these loci pairs in humans (Additional Table S6 and Additional Table S8, respectively). Therefore, as for genetic diversity, the patterns of linkage disequilibrium observed across the MHC loci are highly conserved in the human and chimpanzee lineages.

Linkage disequilibrium in chimpanzees compared to small and large human populations

When comparing human RGD and SGD populations, the highest proportion of significant GLD are always found among the former, except for one pair of loci, DQB1 ~ DQA1 (Table 5). Actually, we find both significantly higher proportions of GLD and significantly higher average proportions of haplotypes in linkage disequilibrium in RGD than in SGD populations (Wilcoxon test: p = 0.014 and p = 0.012) (Additional Table S6 and Additional Table S8), which indicates that, globally, demography (i.e. genetic drift) did play a substantial role in the generation of linkage disequilibrium at the HLA loci. However, this effect appears to be less pronounced at the DQA1 ~ DRB1, DQB1 ~ DRB1, DQB1 ~ DQA1 pairs.

Simulations performed on 1000 randomly drawn human population sub-samples show a tendency to under-estimate GLD when sample sizes are low except for pairs DQB1 ~ DQA1, DQB1 ~ DRB1, DQA1 ~ DRB1 and B ~ C (considering samples with GLD in more than 900 sub-samples, we observe between half to two thirds less GLD in the simulated sub-samples except at these four loci pairs) (Additional Table S9). This suggests that the non-detection of significant GLD in chimpanzees for other loci than DQA1 ~ DRB1, DQB1 ~ DRB1, DQB1 ~ DQA1 and B ~ C has a substantial probability to be due to type II errors (false negatives). Regarding individual haplotypes, the proportion of haplotypes in significant LD among 1000 simulated sub-samples drawn from human populations is largely under-estimated, being on average 1.5 to 2 times lower than in the original samples (Additional Table S9). Again this suggests that the proportion of individual haplotypes in significant LD is mostly underestimated in chimpanzees, which may explain why it is up to 3 times lower than that observed in humans at most pairs of loci (Additional Table S6). This means that, overall, chimpanzees are expected to display more GLD and more haplotypes in significant LD than observed in our study, which supports our previous conclusion of their greater resemblance to RGD than to SGD populations.

Discussion

Strong conservation of MHC diversity patterns in humans and chimpanzees

Based on three distinct and complementary statistics describing genetic variation within populations - allelic richness, heterozygosity and nucleotide diversity -, this study has disclosed highly similar patterns of genetic diversity across seven orthologous MHC loci in chimpanzees and humans: overall, both allelic richness and heterozygosity are greater at the three class I loci A, B, C and at DRB1 than at DQA1, DQB1 and DPB1, and nucleotide diversity is greater at loci DRB1, DQA1, DQB1 and B than at A, C and DPB1 (Fig. 2 and Table 4). In addition, based on both global tests and individual haplotypes’ counting, we found similar patterns of linkage disequilibrium across Patr and HLA genes: both highly significant GLD and the highest proportions of individual haplotypes in significant linkage disequilibrium are observed for the same pairs of loci DQA1 ~ DRB1, DQB1 ~ DRB1, DQB1 ~ DQA1 and B ~ C (Table 5 and Additional Table S6), which parallels the strong resemblance between Patr and HLA physical maps in chimpanzees and humans, respectively (Fig. 1). These results indicate that the MHC diversity patterns are highly conserved in the human and chimpanzee lineages and that analogous mechanisms drove the evolution of this genomic region in the two species since their divergence from a common ancestor.

Molecular mechanisms generating diversity at MHC genes

In support to the hypothesis that analogous mechanisms drove the evolution of the MHC region in chimpanzees and humans, it has been suggested that the molecular processes generating nucleotide (and hence also allelic) diversity at most MHC loci are similar in both species: new variants would be mainly generated through point mutations at loci DQB1, DQA1, C and A, through recombination and/or gene conversion at loci DRB1 and B, and through both kinds of mechanisms at DPB1 [58, 73]. This would partly explain why loci DRB1 and B most often exhibit higher nucleotide and allelic diversity than the other class I and class II loci. Interestingly, chimpanzees contrast with macaques [74, 75] and (to some extent) orangutans [76] and gorillas [77], as the MHC polymorphism of these species (Mamu, Popy and Gogo, respectively) would also evolve through gene duplications at both loci B and A.

Signatures of demography on Patr and HLA loci

Besides the mechanisms generating diversity at the molecular level, both demographic processes and natural selection are known to shape the patterns of populations’ genetic diversity at MHC genes, with possible confounding effects [30, 31, 78]. In this regard, it has been suggested that chimpanzees and humans underwent distinct demographic histories [44, 54, 66, 69, 71, 79,80,81,82,83] that probably affected in different ways their MHC profiles [55, 56, 61, 62]. However, demographic evolution has not been uniform in all human populations either [49]. In order to better disentangle the evolutionary mechanisms that drove the evolution of MHC genes in the two species, we thus compared chimpanzees to many different human populations displaying a wide diversity of demographic histories [84] and living in distinct geographical locations – and hence being also submitted to very diverse environmental pressures [23, 85].

As expected, large ranges of genetic diversity values were observed among human populations (Fig. 2). Interestingly, the three genetic diversity indexes - allelic richness, heterozygosity, and nucleotide diversity - appeared to be similar between chimpanzees (especially the wild-born cohorts BPRCwb and Kumawb) and the small isolated human populations that likely underwent rapid genetic drift (RGD), regardless of the geographic regions or continents where these human populations lived, and different between chimpanzees and the large outbred (SGD) human populations (Fig. 2 and Additional Figure S3). As an example, the very low nucleotide diversity found at the four Patr genes DPB1, DQB1, C and A is comparable to that found at the orthologous HLA genes in Amerindians and Australian Aborigines (Additional Figure S3) as examples of RGD populations. Because neither human populations living in America and Australia nor chimpanzees living in sub-Saharan Africa likely experienced the same pathogenic pressures, comparable demographic histories (i.e. limited population sizes) better explain the similarities than convergent selective effects. Regarding linkage disequilibrium, our simulations indicated that we probably underestimated the amount of GLD and individual haplotypes in significant LD in chimpanzees. This plays in favour of a putative greater resemblance between chimpanzees and RGD (which display high levels of linkage disequilibrium) than between chimpanzees and SGD, as a result of genetic drift.

Actually, the idea that Western chimpanzees underwent a substantial reduction in population size has been supported by analysing other parts of the genome. First, studies on both autosomal genes and whole genome sequences [44, 66, 67, 70, 86,87,88] have indicated that Western chimpanzees are generally less diverse than the other Pan sub-species, which sustains the hypothesis of several past bottlenecks in the former [44, 79]; second, Western chimpanzees’ genomic diversity has been found to fall within the average observed for Non-African human populations, which show a much lower genetic diversity than African populations (Fig. 1b of [44]). Therefore, although MHC genes are known to be targets of natural selection, our study reveals that traces of past bottlenecks that impacted non-MHC genes are detectable when analysing the genetic diversity patterns of Patr genes, and more particularly that of the four loci DPB1, DQB1, C and A.

Signatures of natural selection on Patr and HLA loci

At the other three MHC loci (DQA1, DRB1 and B), the genetic diversity observed in Western chimpanzees does not simply mirror that of human populations that likely underwent rapid genetic drift (RGD). Indeed, the nucleotide diversity observed in chimpanzees is either similar to or greater than (significantly at locus B) that found in human populations with very diverse demographic histories, e.g. in Africa and Europe (Fig. 2, central pane and Additional Figure S3). Furthermore, chimpanzees exhibit both high nucleotide diversity and low heterozygosity compared to human populations at locus B, while the reverse (i.e. low nucleotide diversity and high heterozygosity) is found at loci A and DPB1. The differences observed between these genes (DQA1, DRB1 and B) and the others (DPB1, DQB1, C and A) is thus probably due to more complex mechanisms involving not only demography (as described above) but also natural selection, i.e. distinct susceptibilities of different Patr genes to pathogenic environments. This is not contradictory with the fact that we did not detect significant departures from selective neutrality for Patr genes in the studied chimpanzee cohorts, as our simulations showed that these results could be due to type II errors.

To better understand how the MHC polymorphism could have evolved in chimpanzees under simultaneous demographic and selective forces, we must first consider which kinds of natural selection may have targeted different Patr genes. According to the scenario that was initially proposed by de Groot et al. [55], a specific mechanism would have affected substantially the MHC genetic profile of chimpanzees, namely a strong selective sweep owing to the action of a viral pathogen (the simian form of HIV, i.e. SIV or a related retrovirus) decimating this species ~ 2 to 3 million years ago, followed by a second bottleneck in the Western subspecies [44]. As a consequence, many Patr class I alleles would have been lost and the only surviving individuals would have been those carrying alleles providing resistance to the involved pathogen [55, 56]. This loss of diversity would have specifically affected the Patr-A gene, because at this locus all alleles of a single lineage, A2, were virtually lost, but also the Patr-B and -C genes, based on molecular evidence at intron and MIC regions (see Background above). Actually, as the selective sweep that affected Patr genes had apparently been quite substantial, we would have expected significantly lower (rather than similar) levels of MHC genetic diversity in Western chimpanzees than in small isolated human populations that started to lose diversity much more recently (i.e. at most since modern human populations left their homeland in sub-Saharan Africa). We however have to consider here that chimpanzees had a long time to restore genetic diversity by expanding again demographically after the bottleneck(s) that affected them well before the emergence of modern humans.

Besides a selective sweep, however, balancing selection (in the form of heterozygote advantage) is another mechanism that did affect the evolution of Patr genes. Indeed, MHC genes were found to present strong signals of balancing selection in all great apes’ lineages [89]. Moreover, this kind of selection explains the sharing of ancient MHC lineages by humans and chimpanzees at loci DQB1, DQA1, DRB1, C and A [5, 16, 73, 90,91,92]. Actually, many works suggest that MHC genes are potential targets of both directional (selective sweep) and balancing selection [78, 93, 94].

Evolution of Patr genes’ diversity: tentative scenarios

Taking the different evolutionary mechanisms mentioned above into account, i.e. mutational/recombination events generating molecular diversity, as well as demographic processes and distinct kinds of natural selection increasing or decreasing the levels of genetic diversity, the results uncovered by the present study support original scenarios for the evolution of Patr genes.

For class I genes, we principally hypothesize that the genetic diversity of Patr-A and Patr-B regenerated when Western chimpanzees expanded demographically (although to a small extent) after the bottlenecks that occurred, first, ~ 2 to 3 million years ago in the ancestors of chimpanzees and bonobos [55, 56] and, later on, about 500,000 years ago in the likely differentiated Western chimpanzee subspecies [44]. This idea finds good support in the equivalent amounts of Patr class I nucleotide diversity found in Western (P.t.verus) and Central (P.t.troglodytes) chimpanzees (Fig. 3), in spite of the latter having experienced the least severe population bottleneck among all Pan subspecies [44].

Fig. 3
figure 3

Nucleotide diversity at MHC loci and other genomic regions in Western chimpanzees (A) and in different sub-species of chimpanzees and bonobos (P. paniscus) (B). R1: Non-coding autosomal regions [66]; R2: Non-coding autosomal regions [67]; R3: Xq13.3 [95]; R4: Non-coding autosomal regions [82]; R5: Mitogenome [82]; R6: Mitogenome [54]; Patr/Papa-B, C, A: average nucleotide diversity for genes Patr/Papa-B, −C, −A: this study, [61, 62]. No data is available for R1, R2 and R3 in P.t.ellioti, for R3 in P.paniscus and for R3 in P.t schweinfurthii. Values are given in Additional Table S10

This recovery of genetic variation would have occurred, however, through distinct mechanisms and with distinct intensities at the two loci Patr-A and Patr-B. At Patr-B, recombination and/or gene conversion would have rapidly created new alleles and highly divergent sequences, explaining why chimpanzees, like humans, display higher nucleotide diversity at this locus than at the other class I genes. Asymmetric balancing selection, whereby heterozygotes with more divergent alleles would have an advantage [21, 96] would have also acted on Patr-B, as this type of selection also tends to increase nucleotide diversity [97]. Noteworthy is the fact that particular HLA-B alleles have been positively selected in African human populations in response to Plasmodium falciparum malaria [26] and that functionally similar alleles have recently been identified in bonobos which live in an area with a high prevalence of this parasite [27]. If we assume that common chimpanzees underwent similar responses to pathogens, locus Patr-B (like HLA-B in humans [26]) would have been affected by a (relatively) soft selective sweep whereby several alleles have been positively selected, thus explaining both the high cumulated frequency of three Patr-B alleles (see Supplementary Text) and the high values of heterozygosity and allelic richness found at this locus. By contrast, at Patr-A new variants would have primarily been generated by point mutations that accumulate at slow rates during evolution, which may explain the low nucleotide diversity observed at this locus. Nevertheless, the high heterozygosity found at Patr-A (actually slightly higher than at Patr-B and HLA-A) suggests that heterozygous advantage also had a substantial effect on this gene after its drastic loss of diversity, possibly as an efficient way to rapidly restore a minimal immune protection despite a slow regeneration of diversity through point mutations. Interestingly, Patr-A molecules display a lower peptide binding repertoire than Patr-B and HLA-A [98], suggesting that they have also evolved a peptide binding site that is more promiscuous [64, 99] as a compensation for their severe loss of diversity or that promiscuous alleles were selected preferentially [31]. Finally, the genetic diversity of Patr-A and Patr-B might have evolved in concert according to a model of joint asymmetric selection as proposed for HLA-A and HLA-B [64, 100], allowing distinct levels of polymorphism to be maintained at the two loci as long as both of them have jointly ensured a sufficient immune protection.

Compared to Patr-A and Patr-B, Patr-C displays a lower level of nucleotide diversity in chimpanzees, like in humans. Knowing that both Patr-C and HLA-C molecules are ligands for killer-cell immunoglobulin-like receptors (KIR) [101, 102], the interaction of HLA and KIR molecules being crucial to regulate the killer function of natural killer cells [103], Patr-C molecules were probably submitted to similar functional constraints as HLA-C, resulting in substantial directional and/or purifying selection. However, contrary to Patr-DQB1, for which we suppose the same kinds of selection as for Patr-C (see below), the strong linkage disequilibrium that characterizes the B ~ C loci pair in chimpanzees and humans might have attenuated the opposite effects of balancing and positive/purifying selection impacting loci B and C, respectively.

For class II genes, our results also indicate distinct evolutionary histories for the different loci. As MHC class II genes more specifically respond to parasitic and bacterial infections, they would have been less directly impacted by the viral epidemic proposed in [55, 56]. Moreover, the selection criteria are also less strict as class II genes are generally more promiscuous binders and select longer peptides for binding. Nevertheless, like Patr-A, it is likely that Patr-DRB1 underwent a substantial selective sweep reducing the number of allele lineages, as inferred from its much lower allelic richness compared to Patr-B and in agreement with the apparent loss of all alleles belonging to the DRB1*04 lineage [5]. Such selection would have been mostly independent from that affecting class I genes – i.e. possibly involving other pathogens – since global linkage disequilibrium is not significant between DRB1 and class I genes. Also, because Patr-DRB1 evolves through recombination and/or gene conversion, its putative loss of diversity in the past would have been followed, as proposed above for Patr-B, by a rapid regeneration of nucleotide diversity, which is particularly high at this locus (Fig. 2). Note also that in chimpanzees, MHC class II diversity is particularly high at the haplotype level thanks to inter-locus recombinations despite important loss of variation at single genes due to the past selective sweep [104].

Patr-DQA1 and Patr-DQB1 exhibit contrasting levels of nucleotide diversity and heterozygosity, high for DQA1 and low for DQB1 (note, however, that Patr-DQA1 data were only available for one cohort of chimpanzees, BPRCwb), despite the fact that these two genes are in strong linkage disequilibrium and encode the two complementary chains of the Patr-DQ molecules. Among all loci tested, Patr-DQB1 is actually the most divergent to its orthologue in humans for these two indexes (Fig. 2). Studies have stressed the fact that DQ molecules evolve under purifying selection due to strong functional constraints and with a limited dynamic of evolution in both humans and chimpanzees [56, 105]. The low diversity found at Patr-DQB1, with a single allele (DQB1*03:02) reaching a frequency above 60% in the BPRCwb cohort (Additional Figure S4), would indicate a stronger constraint on the β chain. By contrast, Patr-DQA1 would have evolved by maintaining several alleles (although a limited number, like at DQB1) at more even frequencies, as also observed for HLA-DQA1 in human populations [21, 106]. Based on our results, we also hypothesize that the very high nucleotide diversity observed at Patr-DQA1 (the highest of all studied loci) results from a molecular evolution mainly characterized by recombination and/or gene conversion rather than point mutations.

Finally, the low nucleotide diversity (and, to a lesser extent, allelic richness) found at Patr-DPB1 is comparable to that observed at HLA-DPB1 in small-sized and isolated populations that likely experienced rapid genetic drift (such as Australian Aborigines and Amerindians), although this is not the case when looking at heterozygosity (Additional Figure S3). These results suggest an effect of balancing selection in the form of heterozygous advantage (explaining the high level of heterozygosity) combined with a slow generation of diversity through point mutations (explaining the low nucleotide diversity falling at the opposite of what is observed for Patr-DQA1), as suggested for Gogo-DPB1 in gorillas [107]. Interestingly, the low nucleotide diversity observed at DPB1 appears to be rather close to that observed at neutral genomic regions, although the whole Patr region is clearly exceptionally diverse in this respect (Fig. 3 and Additional Table S10).

The main mechanisms that would explain the evolution of the different Patr genes after the ancient bottlenecks that affected Western chimpanzees are illustrated in Fig. 4.

Fig. 4
figure 4

Schematic representation of the evolutionary mechanisms explaining the genetic diversity observed in Patr genes. For each diversity index, the Patr loci are plotted according to the values given in Table 3 for the pooled cohort of chimpanzees. The pooled cohort includes all cohorts except Texascb

Conclusions

By revealing similar patterns of genetic diversity and linkage disequilibrium in Western chimpanzees and humans across the main MHC loci, our study suggests that these genes have been shaped by analogous mechanisms in both species despite several million years of independent evolution. This led us to conclude that the MHC region and the evolutionary mechanisms shaping it have been highly conserved in the human and chimpanzee lineages. Our work also uncovered deep similarities between Western chimpanzees and smaller, isolated human populations most likely having undergone rapid genetic drift, independently of their geographic locations and genetic backgrounds, supporting a substantial effect of limited population sizes on MHC evolution in both species. We then proposed plausible scenarios for the molecular evolution of each Patr gene taking into account the strong selective sweep(s) that affected Patr genes after the ancient bottlenecks of Western chimpanzees that, curiously enough, did not substantially deplete their levels of MHC genetic diversity. These scenarios suggest that several Patr genes recovered allelic and/or nucleotide diversity after these bottlenecks thanks to the action of both balancing selection (DRB1, B, A) and rapid generation of polymorphism through recombination and/or gene conversion (DRB1, B). On the other hand, other loci kept a rather low diversity due to stronger directional or purifying selection and/or a slower process of molecular diversification through point mutations (DQB1, C), and some mixed processes also likely occurred (DPB1, DQA1). The possibility to substantially regenerate a high genetic diversity after a bottleneck, as originally proposed for Patr genes in this study, is essential for genes involved in immunity, like those of the MHC complex. Indeed, such a process is likely to restore the potential of a population to resist multiple infectious diseases and may thus be decisive for the long-term survival of critically endangered species like the chimpanzee.

Methods

Chimpanzee cohorts

The chimpanzee data include both wild-born (wb) and captive-born (cb) Western chimpanzees (P.t.verus called chimpanzees hereafter) for which Patr analyses were previously published. The available data include four cohorts:

  1. 1.

    BPRCwb, consisting of 29 wild-born individuals captured in Sierra-Leone in the late seventies, who further founded the colony that was originally housed at the Biomedical Primate Research Centre (BPRC) [16, 58, 99]. According to mitochondrial and segregation analyses [108], all individuals appear to be unrelated.

  2. 2.

    Yerkescb, consisting of 22 captive-born individuals from US institutions [59, 109, 110]. Relatedness between animals is unknown.

  3. 3.

    Texascb, consisting of 23 captive-born individuals housed in US institutions [60, 111]. However, contrary to Yerkescb, this cohort may contain animals from different sub-species and/or hybrid animals (personal communication from the authors of [60]).

  4. 4.

    Kumawb, consisting of 19 wild-born individuals (of unknown origin, captured in the seventies) who were previously housed in research institutions in Japan and were further retired in the Kumamoto Primate Park, Japan [112,113,114]. Relatedness between animals is unknown.

We arranged the samples in two ways for the analyses: a) by considering separately the four cohorts defined above; and b) by grouping the individuals from BPRCwb, Yerkescb and Kumawb within a single cohort (called the “pooled cohort” hereafter). We did not include Texascb in the pooled cohort because of uncertainties regarding the represented sub-species.

The detailed information of each chimpanzee cohort is given in Additional Table S11.

Human populations

The human data are a subset of 50 to 89 population samples (depending on the locus) taken from the HLA-typed populations analysed in [21]. They represent 10 geographical regions (North Africa, South Africa, North America, South America, Europe, South-East Asia, North-East Asia, South-West Asia, Australia and Pacific). Based both on a previous paper using most of the same population samples as in this study [100] and on additional ethnological information [50], we defined each population as either RGD (meaning rapid genetic drift) or SGD (meaning slow genetic drift). RGD include small and isolated populations from different continents, mostly Indigenous populations from North and South America, Taiwan, Indonesia, Melanesia and Australia as well as populations from the Saharan region (e.g. Berber speaking) and hunter-gatherers from Central Africa, all other population being classified as SGD. All human populations were analysed separately from each other (i.e. never pooled as a single dataset but considered as multiple populations taken together when reporting the results) in the whole study. The detailed information of each human population sample is given in Additional Table S12.

MHC data

For both chimpanzee cohorts and human populations, the MHC data consist of multi-locus genotypes (including loci A, B, C, DRB1, DQA1, DQB1 and/or DPB1) composed of alleles defined at the 2nd field level of resolution according to the official nomenclatures of the IPD-MHC [115, 116] and IPD-IMGT/HLA [17] databases for Patr and HLA, respectively. At this resolution level, the alleles differ by one or more nucleotide substitutions that change the amino acid sequence of the MHC protein. Moreover, because these data result from exons 2 and 3 (for class I) or exon 2 (for class II) molecular typings, the assessed variation was restricted to the PBR (class I: full exons 2 and 3 sequences; class II: full exon 2 sequences).

A summary of the data used in this study is presented in Table 6.

Table 6 Summary of the chimpanzee and human population data at Patr and HLA genes, respectively

Statistical analyses

Allele frequencies and Hardy-Weinberg equilibrium

We estimated allele frequencies with an EM algorithm, the Gene-Counting Expectation Maximisation algorithm implemented in [117]. These estimates can be considered as population frequencies if Hardy-Weinberg equilibrium (HWE) is satisfied. We thus tested HWE in all populations by using a Likelihood Ratio Test (LRT) that compares the likelihood of frequencies estimated under HWE to the likelihood of those estimated under an inbreeding model.

Genetic diversity

We determined genetic diversity within each chimpanzee cohort or human population by using three different statistics: allelic richness ar, expected heterozygosity h, and nucleotide diversity π:

  1. 1.

    Allelic richness ar was estimated by the number of alleles expected in a population sample of size equal to the rarefaction size 2n (i.e. the size of the smallest sample of n individuals at this locus) [118] as:

    $$ ar=\sum \limits_{\mathrm{i}=1}^{\mathrm{k}}1-\frac{\left(\genfrac{}{}{0pt}{}{2\mathrm{N}-{\mathrm{N}}_{\mathrm{i}}}{2\mathrm{n}}\right)}{\left(\genfrac{}{}{0pt}{}{2\mathrm{N}}{2\mathrm{n}}\right)} $$

where k is the number of alleles in the sample, 2n the rarefaction size and Ni the number of occurrences of the ith allele among the 2 N sampled genes. Using this index is particularly appropriate when highly polymorphic genes like MHC are studied in samples of small sizes. Rarefaction sizes (2n) were 50 for A, 58 for B, 56 for C, 60 for DPB1, 58 for DQA1, 66 for DQB1 and 52 for DRB1 when allelic richness was estimated on the pooled cohort of chimpanzees and the different human population samples, and 44 for A, 44 for B, 44 for C, 38 for DPB1, 58 for DQA1, 32 for DQB1 and 34 for DRB1 when the four cohorts of chimpanzees were considered separately.

  1. 2.

    Expected heterozygosity h (equivalent to Nei’s gene diversity, [119]) within a sampled population at HWE was computed according to:

    $$ h=1-\sum \limits_{\mathrm{i}=1}^{\mathrm{k}}{{\mathrm{p}}_{\mathrm{i}}}^2 $$

where k is the number of alleles and pi the frequency of the ith allele in the sample. Expected heterozygosity is not necessarily correlated to allelic richness since the latter is only influenced by the number of alleles and not by their frequency; for example, identical allelic richness may be observed in populations showing dissimilar heterozygosity (i.e. high heterozygosity due to the presence of many intermediate frequency alleles, as expected under balancing selection, or low heterozygosity due to the presence of one very frequent and many rare alleles, as expected under purifying selection).

  1. 3.

    Contrary to the expected heterozygosity, nucleotide diversity π takes into account the number of nucleotide differences between alleles [119]. To compute this index, a DNA sequence (class I: exon 2 and 3; class II: exon 2) was first assigned to each allele by using the IPD/MHC and IPD/IMGT-HLA resources [115, 116, 120]. Nucleotide diversity was then estimated as:

    $$ \pi =\frac{\sum_{i=1}^k\sum \limits_{j<i}{p}_i{p}_j{d}_{ij}}{L} $$

where k is the number of alleles, L the number of sites in the sequence, pi and pj the frequencies of the ith and jth allele in the sample, respectively, and dij the number of nucleotide differences observed between alleles i and j. Nucleotide diversity is not necessarily correlated to expected heterozygosity; for example, identical heterozygosity may be observed in populations showing distinct genetic profiles where alleles are either molecularly very close (i.e. due to their slow diversification through rare point mutations) or molecularly very distant (i.e. due to their rapid diversification through recombination and/or gene conversion).

The three indices described above complement each other as they convey a different information on the genetic diversity observed within a given cohort or population.

Selective neutrality

To assess whether MHC genes are significantly submitted to selective pressures or behave as neutral markers, we searched for signals of natural selection by applying the Slatkin’s version of the Ewens-Watterson selective neutrality test (named EWS test thereafter) based on allele frequencies [121,122,123,124] as implemented in [117]. The p-values obtained through the resampling process were adjusted for multiple testing using the False Discovery Rate (FDR) method [125]. The tests were done without prior assumptions, thus two-tailed rejection at the 5% level either occurs above 97.5% for excess of homozygotes or below 2.5% for excess of heterozygotes.

Linkage disequilibrium

As our study explores the genetic diversity at multiple MHC loci, we estimated both global linkage disequilibrium and proportions of haplotypes in significant linkage disequilibrium for all pairs of loci for which data were available. The assessment of global linkage disequilibrium was performed by means of a resampling procedure (named PRS, for Parametric Resampling Schema, hereafter) generating an empirical distribution for a likelihood ratio test (LRT) statistic based on the likelihood of allele and haplotype frequency estimates, the final result being the percentile of the observed LRT statistic (PRS) in the empirical distribution [106, 126] . Haplotypes in significant linkage disequilibrium were determined by a χ2 test (see Supplementary Text).

Genetic distances

We compared the Patr frequency distributions between each pair of chimpanzee cohorts by computing Prevosti’s genetic distances [127] according to:

$$ {D}_{P,Q}=\frac{1}{2}\sum \limits_{i=1}^k\left|{p}_i-{q}_i\right| $$

where pi and qi represent the frequencies of allele i in populations P and Q, respectively. The proportion of shared frequencies between cohorts was then estimated as the complement to 1 of Prevosti’s distance given in percentages.

All frequency estimations and statistical analyses based on allele frequencies were performed using the hla-net (www.hla-net.eu) Gene [rate] tools [117]. Arlequin 3.5 [128] and Fstat [129] were used to estimate nucleotide diversity and allelic richness, respectively. When necessary, p-values were adjusted using Holm’s correction [130].

Computer simulations

We checked the robustness of our results by controlling for the great discrepancy in sample sizes between chimpanzee cohorts and human populations through computer simulations using a resampling procedure. For each human population sample and each locus, we randomly drew 1000 sub-samples of the same size as the pooled cohort of chimpanzees (i.e. N = 44 for DPB1, 48 for DQB1, 29 for DQA1, 46 for DRB1, 51 for B, 51 for C and 50 for A) on which we tested Hardy-Weinberg equilibrium, we estimated the 3 diversity indices, we applied the selective neutrality test and we assessed linkage disequilibrium.