Current perspectives on the intensity of natural selection of MHC loci

Yasukochi, Yoshiki; Satta, Yoko

doi:10.1007/s00251-013-0693-x

Current perspectives on the intensity of natural selection of MHC loci

Brief Communication
Open access
Published: 03 April 2013

Volume 65, pages 479–483, (2013)
Cite this article

Download PDF

You have full access to this open access article

Immunogenetics Aims and scope Submit manuscript

Current perspectives on the intensity of natural selection of MHC loci

Download PDF

Yoshiki Yasukochi¹ &
Yoko Satta¹

1875 Accesses
25 Citations
Explore all metrics

Abstract

Polymorphism of genes in the major histocompatibility complex (MHC) is believed to be maintained by balancing selection. However, direct evidence of selection has proven difficult to demonstrate. In 1994, Satta and colleagues estimated the selection intensity of the human MHC (human leukocyte antigen (HLA)) loci; however, at that time the number of HLA sequences was limited. By comparing five different methods, this study demonstrated the best way to calculate the selection coefficient, through a computer simulation study. Since the study, many HLA nucleotide sequences have been made available. Our new analysis takes advantage of these newly available sequences and compares new estimates with those of the previous study. Generally, our new results are consistent with those of the 1994 study. Our results show that, even after 20 years of exhaustive sequencing of human HLA, the number of dominant HLA alleles, on which our original estimate of selection intensity depended, appears to be conserved. Indeed, according to the frequency distribution for each HLA allele, most sequences in the database were minor or private alleles; therefore, we conclude that the selection intensities of HLA loci are at most 4.4 % even though the HLA is the prominent example on which the natural selection has been operating.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The large extent of polymorphism of major histocompatibility complex (MHC) genes is believed to be maintained by balancing selection for the extent of the peptide binding repertoire between individuals (Hughes and Nei 1988, 1989; Takahata and Nei 1990; Hughes and Yeager 1998). A unique effect of balancing selection is the long persistence time of alleles in populations and, consequently, trans-species polymorphism (Klein 1987; Takahata 1990; Takahata et al. 1992; Klein et al. 1998, 2007). However, it is difficult to show direct evidence of such selection by experiments and to measure selection intensity directly. Satta et al. (1994) estimated the intensity of selection at the human MHC (human leukocyte antigen (HLA)) loci by using the available collection of allelic sequences and a simple model based on symmetric overdominant selection and the theory of allelic genealogy (Kimura and Crow 1964; Takahata 1990; Takahata and Nei 1990; Takahata et al. 1992).

In recent years, a number of HLA allelic nucleotide sequences have become available through IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla/, Robinson et al. 2011). Currently (2012), the database contains 7,670 alleles. This large dataset of sequences provides an opportunity to estimate more reliable evolutionary parameters, such as natural selection intensity. Hence, we re-estimated the selection coefficient and compared the estimates with those in the previous study that was based on a limited number of sequences (Satta et al. 1994).

The large number of nucleotide sequences at the six functional HLA loci (HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1, and HLA-DPB1), which play important roles in peptide presentation, was obtained from the IMGT/HLA database. In addition, nucleotide sequences of alleles at the HLA class II A (DQA1 and DPA1) and class II B (DRB3 and DRB5) loci were also used in this analysis. Because the inclusion of recombinants will lead a biased estimation of the selection intensity, possible recombinant alleles were excluded by using the method described by Satta (1992). This method assumes that the relationship between the number of substitutions in a particular region and the number of substitutions in the entire region is binomially distributed. At the HLA-B locus, an exceptionally divergent HLA-B*73:01 allele (Abi-Rached et al. 2011), which might have been transmitted to extant humans from a distinct Homo by interbreeding, was also excluded from this analysis. Applying the theory of allelic genealogy under symmetric overdominant selection to this analysis, we used only dominant alleles that have a frequency >1 % throughout various human populations (the NCBI dbMHC database, http://www.ncbi.nlm.nih.gov/gv/mhc, Meyer et al. 2007). We also excluded the nucleotide sequences with a wide range of undetermined nucleotides from this analysis (Table 1). Therefore, the number for alleles used in this analysis was limited to 9 HLA-A alleles, 19 HLA-B, 20 HLA-C, 25 HLA-DRB1, 13 HLA-DQB1, 10 HLA-DPB1, 6 HLA-DQA1, 3 HLA-DPA1, 13 HLA-DRB3, and 5 HLA-DRB5. These HLA alleles are listed in Online Resource 1. Interestingly, most of the enormously large numbers of nucleotide sequences in the current database are minor or private alleles.

Table 1 The number of alleles and dominant alleles in the database

Full size table

According to the theory described in Takahata (1990) and Takahata et al. (1992), to estimate the selection coefficient s, two estimators, γ and K _B, must be calculated. The estimator γ is the ratio of the number of nonsynonymous substitutions per peptide-binding region (PBR) site to that of synonymous substitutions per site among given pairs of alleles, whereas K _B is the mean number of pairwise nonsynonymous substitutions in the PBR. The number of synonymous and nonsynonymous sites was estimated using the modified Nei–Gojobori method (Zhang et al. 1998) with the Jukes–Cantor correction (Jukes and Cantor 1969). Because of the relatively early ceiling in the number of nonsynonymous substitutions in the PBR due to acceleration of the nucleotide substitution rate by balancing selection, Satta et al. (1994) developed five methods for estimating K _B, and these methods were evaluated by computer simulations. Here, we used method II because this method minimized errors in the multiple-hit correction (Satta et al. 1994). In this method, selection coefficients can be adequately estimated by using only sets of sequences that are relatively closely related.

The estimated values of K _B and γ at the six major HLA loci described above are provided in Table 2. Using these values, we obtained other estimators, M and S, which were also necessary for estimating the selection coefficient, s (see Satta et al. 1994). Assuming that a long-term effective population size of humans is 10⁵, the s values of HLA-B and HLA-DRB1 loci (s = 4.4 and 1.9 %, respectively) in the present study were the highest for the class I and class II loci, respectively. This result was consistent with that of the previous study (Satta et al. 1994). All s values were more or less similar to those of the previous study with the exception of DQB1 and DPB1 loci: the current estimate of DQB1 was lower than the previous estimate and the value for DPB1 was much higher than the previous estimate (Satta et al. 1994). One possible reason for this is the different set of nucleotides sequences used than the previous study. In fact, both for DQB1 and DPB1, the number of dominant alleles used in the present analysis increases compared to that of the previous one.

Table 2 Estimates of the mean number of nonsynonymous substitutions, the relative nonsynonymous substitution rate in the PBR, and the selection coefficient (s)

Full size table

Allelic genealogy predicts that K _B is approximately equal to the number of dominant alleles (n _a) in a population. In fact, n _a showed good agreement with K _B in three class II B loci (Table 2). In class I loci, the HLA-C showed relatively good agreement between n _a and K _B, whereas for the HLA-A and HLA-B loci, the observed number of dominant alleles was less than the expected number. This discrepancy might indicate that the definition of dominant alleles is inappropriate for class I loci. Originally, we regarded an allele with a frequency of more than 1 % over all populations examined as a dominant allele. According to the dbMHC database, the number of chromosomes examined at all three class I loci was more than 10,000 in total, ranging from allele to allele. Thus, we defined 1 % (100 chromosomes) of 10,000 chromosomes as a class I dominant allele. In addition, the mean number of populations in which class II dominant alleles were observed was about 25. Therefore, for class I loci, we considered the alleles detected on >100 chromosomes through >25 populations as a dominant allele. Surprisingly, n _a of class I loci under this new definition showed good agreement with K _B (Table 2). This might imply that some dominant alleles, with <1 % allele frequency in the entire world population, were dominantly distributed throughout the human population until quite recently and that they have decreased in frequency because their alleles might be replaced by other alleles that had an advantage in the modern environments of some populations. The number of different dominant alleles in the PBR also shows good agreement with expectations (Table 1). After the exclusion of possible recombinants, the numbers at each locus were 26 at HLA-A, 39 at HLA-B, and 19 at HLA-C. However, when we included rare alleles, these numbers increased to 32, 113, and 60, respectively. The number of rare alleles which have de novo PBR nonsynonymous mutations is large and they may have emerged by a population expansion quite recently (Fu et al. 2013).

In addition to the above estimates, we further estimated the selection coefficients for DRB3, DRB4, DRB5, DQA1, and DPA1 (Table 2). With the exception of DRB4 (see below), all selection coefficient s of the four HLA class II loci were lower than those of the six major HLA loci (HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1 and HLA-DPB1), indicating that the six major loci have been strongly affected by balancing selection. The present s estimate of DQA1 is lower than that of the previous one, but the present K _B value is similar to the n _a. We consider that the present estimate is close to the true value. For DRB4, 15 alleles were deposited in the database and they are identical at the PBR sites and nearly identical at the neutral (synonymous and non-PBR nonsynonymous) sites. Thus, inference of the γ and K _B values is difficult. The relatively recent emergence of DRB4 (the per site nucleotide divergence from DRB2 is 0.015∼0.017: Satta et al. 1996) supports this observation. In addition, the small amount of nucleotide divergence at neutral sites for DRB4 indicates the relatively small effective population size of DRB4. This suggests that the frequency of DR53 haplotype on which DRB4 resides is relatively lower than that of other HLA haplotypes. In addition, DRB3 and DRB5 also show the smaller effective size than that of other HLA loci (The estimated N _e values of DRB3 and DRB5 are quite smaller than 10⁵). This is also because that DRB3 and DRB5 are located on a limited DR haplotype, whereas other HLA loci exist in all humans.

Our findings show that although the number of sequences in the database has greatly increased in the past 20 years, most of the accumulated sequences are minor or private alleles and the number of dominant alleles does not change largely since the previous estimation. Therefore, most of selection coefficients in the six major HLA loci estimated in the present study were similar to those of the previous study. One may consider that application of symmetrical overdominance is too strict for the actual data. However, the simulation study by Takahata and Nei (1990) reveals that the asymmetrical overdominance model does not fit the mode of polymorphism for actual data: under a given selection coefficient of asymmetrical model, the number of alleles and the average heterozygosity become smaller than those under symmetrical overdominance model. In fact, the number of dominant alleles at all HLA loci was consistent with the K _B values under symmetrical overdominance, suggesting the consistency between our assumed model and the actual data. Therefore, the overdominance model is appropriate to the present estimation. Through this analysis, we confirmed that the selection intensity (selection coefficient, s) of HLA loci in modern humans is at most 4.4 %, even though HLA is the prominent example on which natural selection acts.

References

Abi-Rached L, Jobin MJ, Kulkarni S et al (2011) The shaping of modern human immune systems by multiregional admixture with archaic humans. Science 334:89–94
Article PubMed CAS Google Scholar
Fu W, O’ Connor TD, Jun G et al (2013) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493:216–220
Article PubMed CAS Google Scholar
Hughes AL, Nei M (1988) Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335:167–170
Article PubMed CAS Google Scholar
Hughes AL, Nei M (1989) Nucleotide substitution at major histocompatibility complex class II loci: evidence for overdominant selection. Proc Natl Acad Sci U S A 86:958–962
Article PubMed CAS Google Scholar
Hughes AL, Yeager M (1998) Natural selection at major histocompatibility complex loci of vertebrates. Annu Rev Genet 32:415–435
Article PubMed CAS Google Scholar
Jukes T, Cantor C (1969) Evolution of protein molecules. In: Munro H (ed) Mammalian protein metabolism. Academic, New York, pp 21–132
Google Scholar
Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738
PubMed CAS Google Scholar
Klein J (1987) Origin of major histocompatibility complex polymorphism: the trans-species hypothesis. Hum Immunol 19:155–162
Article PubMed CAS Google Scholar
Klein J, Sato A, Nagl S, O’hUigin C (1998) Molecular trans-species polymorphism. Annu Rev Ecol Systemat 29:1–21
Article Google Scholar
Klein J, Sato A, Nikolaidis N (2007) MHC, TSP, and the origin of species: from immunogenetics to evolutionary genetics. Annu Rev Genet 41:281–304
Article PubMed CAS Google Scholar
Meyer D et al (2007) Single locus polymorphism of classical HLA genes. In: Hansen JA (ed) Immunobiology of the human MHC: Proceedings of the 13th International Histocompatibility Workshop and Conference, vol. I. IHWG, Seattle, WA, pp 653–704
Google Scholar
Robinson J, Mistry K, McWilliam H, Lopez R, Parham P, Marsh SGE (2011) The IMGT/HLA database. Nucleic Acids Res 39:D1171–1176
Article PubMed CAS Google Scholar
Satta Y (1992) Balancing selection at HLA loci. In: Takahata N (ed) In The Proceedings of the 17th Taniguchi Symposium. Japan Science Society, Tokyo, pp 111–131
Google Scholar
Satta Y, O’hUigin C, Takahata N, Klein J (1994) Intensity of natural selection at the major histocompatibility complex loci. Proc Natl Acad Sci U S A 91:7184–7188
Article PubMed CAS Google Scholar
Satta Y, Mayer WE, Klein J (1996) Evolutionary relationship of HLA-DRB genes inferred from intron sequences. J Mol Evol 42:648–657
Article PubMed CAS Google Scholar
Takahata N (1990) A simple genealogical structure of strongly balanced allelic lines and trans-species evolution of polymorphism. Proc Natl Acad Sci U S A 87:2419–2423
Article PubMed CAS Google Scholar
Takahata N, Nei M (1990) Allelic genealogy under overdominant and frequency-dependent selection and polymorphism of major histocompatibility complex loci. Genetics 124:967–978
PubMed CAS Google Scholar
Takahata N, Satta Y, Klein J (1992) Polymorphism and balancing selection at major histocompatibility complex loci. Genetics 130:925–938
PubMed CAS Google Scholar
Zhang J, Rosenberg HF, Nei M (1998) Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci U S A 95:3708–3713
Article PubMed CAS Google Scholar

Download references

Acknowledgments

This work was supported by Grant-in-Aid for Scientific Research on Innovative Areas from the Ministry of Education, Culture, Sports, Science and Technology of Japan (22133007). The authors thank John A. Eimes for the critical checking of the English language of this manuscript. We owe special thanks to Naoyuki Takahata for providing valuable comments.

Author information

Authors and Affiliations

Department of Evolutionary Studies of Biosystems, The Graduate University for Advanced Studies (SOKENDAI), Shonan Village, Hayama, Kanagawa, 240-0193, Japan
Yoshiki Yasukochi & Yoko Satta

Authors

Yoshiki Yasukochi
View author publications
You can also search for this author in PubMed Google Scholar
Yoko Satta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoshiki Yasukochi.

Electronic supplementary material

ESM 1

(PDF 18.6 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Reprints and permissions

About this article

Cite this article

Yasukochi, Y., Satta, Y. Current perspectives on the intensity of natural selection of MHC loci. Immunogenetics 65, 479–483 (2013). https://doi.org/10.1007/s00251-013-0693-x

Download citation

Received: 29 December 2012
Accepted: 05 March 2013
Published: 03 April 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s00251-013-0693-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Current perspectives on the intensity of natural selection of MHC loci

Abstract

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation