Inter- and Intralocus Recombination Drive MHC Class IIB Gene Diversification in a Teleost, the Three-Spined Stickleback Gasterosteus aculeatus
The mutational mechanism underlying the striking diversity in MHC (major histocompatibility complex) genes in vertebrates is still controversial. In order to evaluate the role of inter- and intragenic recombination in MHC gene diversification, we examined patterns of nucleotide polymorphism across an exon/intron boundary in a sample of 31 MHC class IIB sequences of three-spined stickleback (Gasterosteus aculeatus). MHC class IIB genes of G. aculeatus were previously shown to be under diversifying (positive) selection in mate choice and pathogen selection experiments. Based on recoding of alignment gaps, complete intron 2 sequences were grouped into three clusters using maximum-parsimony analysis. Two of these groups had >90% bootstrap support and were tentatively assigned single locus status. Intron nucleotide diversity within and among loci was low (p-distance within and among groups = 0.016 and 0.019, respectively) and fourfold lower than the rate of silent mutations in exon 2, suggesting that noncoding regions are homogenized by frequent interlocus recombination. A substitution analysis using GENECONV revealed as many intergenic conversion events as intragenic ones. Recombination between loci may explain the occurrence of sequence variants that are particularly divergent, as is the case in three-spined stickleback, with nucleotide diversity attaining dN = 0.39 (peptide-binding residues only). For both MHC class II loci we also estimated the amount of intragenic recombination as population rate (4Ner) under the coalescent and found it to be approximately three times higher compared to point mutations (Watterson estimate per gene, 4Neμ). Nonindependence of molecular evolution across loci and frequent recombination suggest that MHC class II genes of bony fish may follow different evolutionary dynamics than those of mammals. Our finding of widespread recombination suggests that phylogenies of MHC genes should not be based on coding segments but rather on noncoding introns.
KeywordsDiversifying selection Gene conversion Major histocompatibility complex MHC class II Recombination Sequence diversity Three-spined stickleback
MHC (major histocompatibility complex) genes are prime examples of genes under diversifying (=positive) selection. Being the most polymorphic loci identified in vertebrate species thus far, MHC class I and II gene loci often display as many as several hundred alleles (Parham and Ohta 1996). Due to their crucial role in presenting pathogen-derived peptides to specialized cells of the immune system, a role of diversifying selection was proposed in 1975 (Doherty and Zinkernagel 1975). According to the “heterozygote advantage hypothesis,” MHC-heterozygous individuals recognize a wider array of foreign peptides, ultimately resulting in higher Darwinian fitness in the face of pathogen infections. An alternative, nonexclusive form of diversifying selection is frequency-dependent selection, in which individuals carrying rare alleles have a selective advantage under host–pathogen coevolution (Takahata and Nei 1990). Analyses of DNA sequence polymorphism found substantially elevated rates of nucleotide changes resulting in the replacement of an amino acid relative to silent mutations, in line with the above hypotheses of positive (=diversifying) selection (Hughes and Nei 1988).
Since then, experimental work supported molecular inferences. In mice and fish both sexual selection (Potts et al. 1991; Reusch et al. 2001a) and natural selection by parasites and pathogens (Langefors et al. 2001a; Penn et al. 2002; Wegner et al. 2003a,b) maintain high allelic polymorphism within populations. While this work explains the maintenance of MHC polymorphism given a diverse array of alleles, the ultimate mutational mechanism that generates more than 100 sequence variants at classical MHC loci is less clear (Martinsohn et al. 1999). It was originally posited that intralocus variation generally originates from point mutations (Klein 1986). However, if point mutations are the only generator of genetic variation in MHC genes, then the mutation rate should be extremely high, which has not been supported by sequence analyses (Satta et al. 1993). Alternatively, the divergence time between MHC alleles must be extremely long, with allelic lineages predating speciation (Klein 1986). Such “trans-species evolution” of MHC genes is most frequently found among mammalian species (Kupfermann et al. 1992; Kriener et al. 2000) but was subsequently also reported in fish (Figueroa et al. 2000) and birds (Sato et al. 2001).
While under the assumption of very old allele ages, point mutations would suffice for explaining allele diversity and divergence. The mosaic pattern among MHC genes consisting of a complex assortment of various sequence motifs suggested that the diversification of MHC sequence variants may also be a result of recombination sensu lato (Parham and Ohta 1996; Martinsohn et al. 1999). Possible mechanisms include gene conversion-like events and (repeated) microrecombination. Since many classical MHC genes occur as clusters of functionally intact, duplicated genes, interlocus recombination through unequal crossing-over may also generate sequence polymorphism (Ohta 1999). Throughout this paper, we do not distinguish between gene conversion and recombination. While the molecular mechanism of sequence exchange may be different, the observed pattern of polymorphism is similar for sequence data of limited length typically available in population data sets (Wiehe et al. 2000; Richman et al. 2003).
According to the “birth-and-death model” of MHC evolution (Nei and Hughes 1992; Gu and Nei 1999), MHC loci are duplicated repeatedly and maintained in the populations for long times, whereas others have lost their functionality. Models have shown that locus duplication under intergenic recombination may interact synergistically and lead to a higher sequence diversity in terms of particularly divergent sequence variants (Ohta 1999). Divergent alleles, in turn, may confer fitness advantages under the divergent allele advantage (Wakeland et al. 1990; Richman et al. 2001, 2003), which is when heterozygous individuals carrying very different copies have a higher fitness than heterozygotes with little divergence.
Considerable progress has been made in detecting gene conversion from samples of DNA sequences (Sawyer 1999; McVean et al. 2002; Posada 2002; Stumpf and McVean 2003), and one hope is that the improvement in bioinformatic methods may improve understanding of MHC evolution. A major new development is the application of coalescent theory (Hudson and Kaplan 1985) for predicting recombination rates from a sample of DNA sequences (Brown et al. 2001; Hudson 2001; Stumpf and McVean 2003). This way, not only the presence of recombination, but also its relative contribution can be estimated, substantially increasing qualitative notions of recombination/gene conversion in earlier analyses of sequence polymorphism (Parham and Ohta 1996; see further examples in Martinsohn 1999).
The goal of this paper is to explore and quantify the importance of recombination/gene conversion in relation to point mutations in shaping the diversification of MHC class IIB genes in populations of three-spined sticklebacks (Gasterosteus aculeatus). The MHC class IIB genes of this species currently build one of the most convincing cases for pathogen defense genes that are under diversifying selection. In both sexual and pathogen selection experiments, a crucial role for stickleback MHC class IIB genes in conferring resistance to pathogens and attractiveness to potential mating partners has been demonstrated (Reusch et al. 2001a; Wegner et al. 2003a, b). In addition, three-spined stickleback are increasingly becoming a model of evolutionary biology, including genomics (Reusch et al. 2004), calling for a closer analysis of those genes that have proven to play an important role in parasite defense and mate choice. An analysis of sequence evolution in nonmammalian taxa with respect to potential recombination is rare (but see Langefors et al. 2001b) but warranted given that the evolution of MHC genes may differ considerably among vertebrate classes (Edwards et al. 1995; Hess and Edwards 2002; Stet et al. 2003).
Materials and Methods
Sampling, Sequence Acquisition, and Molecular Methods
Supplementary Table S1
On average, we found 4.5 sequence variants per individual fish (maximum, 8), excluding a putative pseudogene with very divergent exon and intron sequences that was present in almost every fish analyzed (T.B.H. Reusch, unpublished data). Individual fish may thus possess between two and four functional MHC class IIB gene loci. We repeated cloning and sequencing in eight individuals to examine the reproducibility of cloning and sequencing. PCR artifacts may potentially create a pattern that will exactly resemble gene recombination in nature (Bradley and Hillis 1997). We found that most sequence variants were reproducible. In cases of nonidentical sets of sequences in both cloning runs, the additional MHC class IIB sequence clearly did not originate from a PCR artifact based on the distribution of shared and nonshared sequence motifs. Nevertheless, as a precaution against PCR artifacts during cloning, the majority of sequence variants (25/31) were obtained at least twice independently in different individuals.
We based our sequence characterization on genomic DNA because we wished to analyze nucleotide polymorphism and recombination patterns across the exon 2/intron 2 boundary, increasing the power for detecting recombination (Hughes 2000). We tested for the expression of the sequences by SSCP (single-strand conformation polymorphism) as described in Binz et al. (2001) and Reusch et al. (2001a). Genotyping was done in a subsample of individuals from which we simultaneously obtained cDNA and genomic DNA as a PCR template. We found that 14 of 16 identified signals were present in both cDNA and genomic DNA as template (Wegner et al. 2003a, b).
Identification of Putative MHC Class IIB Gene Loci
Sticklebacks are estimated to possess up to six different MHC class IIB gene loci (Sato et al. 1996; Reusch et al. 2001a). A previous analysis using a large insert library revealed that at least two of these loci reveal less than 2% sequence divergence in the introns and have originated by very recent gene duplication (<2 MYA) (Reusch et al. 2004). Nevertheless, we attempted to identify sequence groups indicative of loci by examining the sequence polymorphism of the entire intron 2 by taking advantage of the information contained in the alignment gaps (i.e., indels). Indels of noncoding sequences may evolve faster than point mutations (Britten et al. 2003). In order to utilize information contained in indels, we used maximum parsimony analysis (MP) because this method allows for recoding alignment gaps as parsimony informative sites (Lee 2001). Prior to the analysis, sequences were aligned using the ClustalW algorithm, in combination with manual alignment. MP analysis was conducted using MEGA3 (Kumar et al. 2004), with 450 initial trees and under the close neighbor interchange (CNI) algorithm with a search factor of 3. The robustness of the resulting tree was assessed in 250 bootstrap runs. For gap coding we used the fragment coding approach (Lee 2001). All alignment gaps were recoded as present or absent (if there were no additional polymorphisms in the remaining sequence portions), or as multistage characters if there were additional substitution polymorphism in the stretches with nucleotides. In this way, we obtained 38 recoded state characters in addition to 32 parsimony informative sites consisting of point substitutions to be included in the MP analysis.
We also used the information in exon 2 to construct a dendrogram based on inferred amino acid composition (neighbor-joining algorithm based on p-distance, Poisson corrected, implemented in MEGA3). This was done for comparative purposes, as most MHC sequence data sets use exon 2 to resolve allelic lineages, while sequence information from noncoding introns is often unavailable. We were particularly interested whether the tree topology based on coding exon 2 or noncoding intron 2 would be similar or different, as the latter outcome would indicate recombination across the exon–intron boundary. Also, it is qualitatively predicted that under frequent recombination, phylogenetic trees lack deep allelic lineages (Gu and Nei 1999).
Analyses of Sequence Polymorphism and Positive Selection
First, we quantified the sequence diversity among synonymous and nonsynonymous sites within the coding region (exon 2) using the software MEGA3 (Kumar et al. 2004). We tested for positive (diversifying) selection by computing the number of nonsynonymous substitutions per nonsynonymous site (dN) with the silent substitution rate (dS). This analysis was performed separately for all 70 codons of exon 2 and for the 21 putative peptide-binding codons according to the crystalline structure of the human MHC molecule (Brown et al. 1993). The excess of dN over dS was statistically examined using a Z-test (Hughes and Nei 1988).
Second, we compared the silent polymorphism in exon 2 to the total nucleotide polymorphism in intron 2. In MHC genes, recombination across an intron–exon boundary will lead to very different patterns of polymorphism. Strong diversifying selection in exon 2 will simultaneously increase the polymorphism at silent sites. In contrast, in the adjacent intron, recombination will tend to make sequences more similar under selective neutrality. Therefore the nucleotide polymorphism will be decreased in introns relative to the synonymous substitution rate in exon 2 (Hughes 1999, 2000). Note that this does not imply that the recombination patterns vary between exonic and intronic portions of the gene (Hughes 1999). If intergenic exchange is also common, this will lead to homogenization of intron sequences across putative gene loci (Hughes 1999; Ohta 1999).
Detection of Recombination
We used three computer programs to detect recombination events. The program GENECONV (Sawyer 1999) employs a substitution model that scans for significant clustering of substitutions. Clusters are tested against a null hypothesis by permutation (10,000 runs). This method has a high statistical power of detecting gene conversion when it is actually present, while the risk of obtaining false positive results is low (Brown et al. 2001; Posada 2002). A global p-value that is adjusted for the number of comparisons is calculated, as well as a Bonferroni-adjusted p-value for each recombination event that was detected between any pair of sequences.
Nucleotide polymorphism of MHC class IIB genes (±1 SE) in three-spined sticklebacks (Gasterosteus aculeatus)
A (N = 16)
0.181 ± 0.034**
0.059 ± 0.018
0.42 ± 0.10***
0.065 ± 0.026
0.015 ± 0.003
B (N = 9)
0.152 ± 0.027***
0.044 ± 0.015
0.34 ± 0.07***
0.030 ± 0.017
0.015 ± 0.003
C (N = 6)
0.12 ± 0.024**
0.053 ± 0.021
0.26 ± 0.07***
0.039 ± 0.030
0.011 ± 0.002
We used the recombination analyses provided in DNASP ver. 3.99 (Rozas et al. 2003), which calculates the minimum number of recombination events, ρM, according to Hudson and Kaplan (1985). DNASP was also used to locate the sites where putative recombination events took place.
Maximum Parsimony Analysis of MHC Class IIB Intron 2 Sequences
For comparison with the MP analysis based on the second intron, we also constructed a phylogenetic tree based on the inferred amino acid composition of the second exon (p-distance, Poisson corrected) using the neighbor-joining algorithm implemented in MEGA3. The topology of the resulting bootstrap consensus was very different compared to the second intron. The sequence groups A through C identified using intron 2 polymorphism are distributed among several separate branches of the NJ tree, some of which have >80% bootstrap support at the branch tips, i.e., comprising two or three sequence variants only. The same qualitative result applied to a MP analysis of amino acid or base composition of exon 2 (data not shown).
Sequence Polymorphism in Coding and Noncoding Portions of MHC Class IIB Genes
Our sample of MHC class IIB genes may have been biased if some sequence types are too divergent to be amplified by our cloning primers. We therefore checked for a relationship between the chances of sequences to be amplified, which should be proportional to the mean sequence divergence in a genotype, and the number of sequences we recovered from each fish. Specifically, we computed the mean sequence divergence for all exon 2 positions using the Kimura two-parameter and a transition:transversion ratio of 2 for each fish genotype that was sequenced to near-saturation (i.e., >10 clones per band, on average 34 sequences per fish). In this sample of 30 fish, we find no correlation between the mean sequence divergence and the number of unique sequence variants that could be recovered (r2 = 0.03, p = 0.03).
We then examined sequence polymorphism for each of the three MHC class IIB sequence groups separately. Sequence diversity within the second exon was high, with 69 of 210 (33%) polymorphic sites. The average distance (p-distance ± SE, Poisson corrected) in terms of inferred amino acid sequences within groups A, B, and C ranged from 0.18±0.03 to 0.24±0.04. The magnitude of replacement mutations in putative codons of the PBR (peptide-binding region) was particularly high and attained a value of dN = 0.42 in sequence group A (Table 1). Differences between dN and dS throughout all three sequence groups were highly significant in a Z-test implemented in MEGA3, indicating positive selection. The majority of the polymorphic residues (19/21=90%) were found at positions identical to those involved in antigen binding of human MHC class IIB genes (data not shown; Brown et al. 1993).
The polymorphism of the intron 2 sequences was very different. Here, we found low nucleotide diversity, on average four times lower than the synonymous substitution rate within exon 2. This difference is statistically different in all three identified sequence groups given the nonoverlapping confidence intervals (Table 1). Interestingly, the between-group polymorphism in the second intron (p-distance±1SE) was very low (group A vs. B, 0.0154±0.002; group A vs. C, 0.019±0.003; group B vs. C, 0.0185±0.003). Accordingly, the amount of polymorphism hardly increases when pooling all three groups (common p-distance = 0.016). This highlights the similarity of the intron 2 sequences outside the alignment gaps. Moreover, this finding stresses the essential information contained in the indels for uncovering the relatedness among intron 2 types using maximum parsimony.
Presence and Relative Amount of Recombination
All three methods detected statistically significant recombination. The substitution approach implemented in GENECONV revealed low statistical p-values for the global tests (N = 31 sequences; 10,000 permutations, p < 0.001). In addition to a significant global test, GENECONV detected 22 pairwise recombination events that were significant at a Bonferroni-adjusted significance level of α ≤ 0.05. Note that this estimate is probably conservative because the large number of pairwise comparisons requires a correction of the nominal statistical α-value by dividing through 465. Of 31 sequences, 15 (=48%) were involved in at least one pairwise recombination event. We then examined which pairs of MHC sequences were involved in statistically significant recombination based on the groupings obtained by MP analysis. Half (11/22) of these events occurred within, and the other half between, putative MHC class IIB loci. Interestingly, no intralocus recombination was detectable within group B sequences, while eight exchanges of B with group A sequences were detected, all of which must be intergenic. Of four recombination events involving Canadian sequence variants, three involved one European sequence type.
Composite-likelihood estimates of the population-wise recombination rate (ρ = 4Ner) and point mutation rate (Watterson estimate per gene = 4Neμ) at MHC class IIB loci in G. aculeatus using the software LDHAT (McVean et al. 2002)
MHC class IIB group
Recombination may reduce or promote genetic polymorphism depending on the selection regime of the target gene segment (Nei and Lee 1980; Strohbeck 1983; Hughes 1999). Under directional selection or neutrality, recombination will lead to gene homogenization and concerted evolution. However, just the opposite effect may be observed under diversifying (positive) selection (Hughes 2000). MHC genes are prime examples to study contrasting effects of recombination because two opposing selective regimes shape the genetic polymorphism: positive selection in the second exon and neutrality in the second intron. The data set here was obtained from sticklebacks, one of the few species where diversifying selection has also been experimentally verified in mate choice and parasite selection experiments (Reusch et al. 2001; Wegner 2003a, b), strengthening any conclusions solely based on molecular genetic data.
Among a sample of 31 sequence variants, we found a fourfold higher substitution rate at synonymous sites in exon 2 compared to the adjacent intron 2. This is the pattern expected under gene conversion, where homogenization of introns relative to the synonymous substitutions in the exon will decrease the amount of sequence difference in this region (Hughes 1999, 2000; Ohta 1999). This finding alone is not sufficient for demonstrating gene conversion across loci but could simply be explained by a long-standing balanced polymorphism that is segregating within a single gene (e.g., Kreitman and Hudson 1991). However, the extremely limited polymorphism between putative MHC class II B loci (p-distance between groups ≤0.019) can be accounted for only if frequent intergenic exchange is assumed.
In support of this notion, we found statistically highly significant signals of intra- and intergenic recombination using two independent methodologies in populations of three-spined stickleback that may explain the observed differences in neutral polymorphism among coding and noncoding regions. Results obtained by a coalescent approach (LDHAT) suggest that the effects of intragenic recombination alone on MHC sequence polymorphism are approximately three times higher than the role of point mutations (4Neμ). In addition, we find that at least half of the total recombination detected in all sequence variants using a substitution approach is due to intergenic recombination events. This qualitatively doubles the role of recombination (intra- and intergenic) for generating MHC sequence polymorphism. The data set comprises sequence variants from Europe and Canada that diverged 2–5 × 105 years ago (Orti et al. 1994). While the sample size of the Canadian fish is not very high, we qualitatively conclude from three recombination events detected between European and Canadian MHC sequences that some of the polymorphism in MHC class IIB is due to relatively old recombination events. This, however, does not rule out a dominant role for recombination for the rapid generation of novel allelic variation (Parham and Ohta 1996).
Since our data set comprised mostly MHC class IIB sequences that were at least twice independently confirmed in other individuals, PCR or cloning artifacts, for example, through Taq-polymerase template switching (Bradley and Hillis 1997), are probably not responsible for the high amount of recombination detected. The six sequence variants that were found only once are not involved more often in recombination than expected (4/22 pairwise recombination events detected in GENECONV). Nevertheless, our results may have been biased since we excluded many rare sequence variants that occurred only once. To examine this, we resampled 31 sequences from a total number of 64 distinct exon 2 sequences that are available from the 48 analyzed fish (singletons plus confirmed sequence variants in more than one individual). We repeated the analysis in GENECONV with 10 such resampling rounds. All data sets revealed highly significant signals of recombination (all global p’s < 0.001, 10,000 randomizations). In the pairwise analysis, we found on average a somewhat larger number of recombination events (26.9) in the resampled data sets, suggesting that we may have underestimated the true amount of recombination.
In order to detect recombination in a substitution approach, the homology criterion within a sample of sequence variants may be orthology or paralogy (Posada 2002). Thus, the results obtained using the substitution approach implemented in GENECONV are reliable, regardless of whether sequence variants were sampled from one or several loci. In contrast, the coalescent approach implemented in LDHAT and DNASP in order to obtain population estimates of the amount of recombination requires that all sequence variants come from one locus, which we achieved after MP analysis of the entire intron 2. Additional simplifying assumptions such as constant population size and no genetic structure also need to apply (Hudson 2001). Simulations by Fearnhead and Donnelly (2001) have revealed that the quotient of ρ/θ as an estimate of the relative amount of recombination (ρ) compared to point mutations (θ) is robust against several violations of the underlying coalescent model, a panmictic population of constant size. First, any demographic dynamic that increases or decreases the effective population size (Ne) cancels out in 4Ner/4Neμ. Geographic population structure up to a fixation index (FST) of 0.2 was shown to yield rather conservative estimates of ρ/θ. FST in the stickleback populations investigated in this study has formerly been found to be around FST=0.2 (Reusch et al. 2001b). Hence, the population substructure should not cause any problem in our analyses.
Two MHC class IIB sequence variants included in this data set were previously identified in a 100-kbp contiguous genomic segment. Hence, they belong unequivocally to two different loci that occur in tandem arrangement approximately 27 kbp distant from one another (Gaac-DAB and Gaac-DBB [Reusch et al. 2004]). Among the two specific alleles they carry, we could detect a pairwise recombination event (GENECONV, pairwise p = 0.007, Bonferroni correction applied) that by definition must be intergenic. In the larger sequence sample presented here, this isolated finding of interlocus recombination was confirmed.
Recombination between loci effectively decreases the phylogenetic signal that may be used for locus identification. Among the sequences analyzed here, the overall sequence similarity in the second intron was remarkably high (p-distance among all 31 sequences = 0.016), although the sample probably consists of four different class IIB loci. Had this been a sequence sample of any mammalian MHC gene, such a high similarity would indicate identical MHC class II locus affiliation. For example, in the human DRB locus, the intron divergence between allelic lineages of the same gene locus is three to four times larger than in sticklebacks (d = 0.06–0.08 [Bergström et al. 1998]). This result implies that at least some of the studied loci are the product of recent gene duplication. Alternatively, the locus duplication may be old, yet ongoing recombination homogenizes the introns such that similarities prevail. At least for the loci Gaac-DAB and -DBB the sequence polymorphism suggests that the duplication was indeed recent because the intron divergence is constant along the entire gene, while we would expect decreasing homogenization (hence more divergence) away from the site of strong positive selection, exon 2 (Reusch et al. 2004). To examine the relative role of ongoing intergenic recombination and the time since gene duplication, longer contiguous stretches of the MHC region are clearly needed (Wiehe et al. 2000).
Deep allelic lineages suggest that intergenic exchange in MHC class II genes of mammals is rare (Gu and Nei 1999). The only other quantitative analysis of gene conversion in a nonmammalian species was conducted in a sequence sample of Atlantic salmon, a species that possesses only one class IIB locus (Langefors et al. 2001b) and cannot undergo intergenic recombination. The importance of intragenic recombination in MHC class II genes has recently been stressed in deer mouse (Richman et al. 2003), where it contributes 12-fold more to sequence diversity than point mutations. For mammalian class II genes, experimental studies in sperm of mice and men revealed that the rate of gene conversion is surprisingly high and exceeds the rate of point mutations by three to four orders of magnitude (Högstrand and Böhme 1994; Zangenberg et al. 1995). In light of this experimental evidence, it is surprising that many authors still ignore the potential role of recombination. In order to explain some of the common motifs shared among MHC alleles, convergent evolution has frequently been invoked as an alternative explanation (Kriener et al. 2000; Figueroa et al. 2000). Under convergence, the same blocks substitutions are found at identical sites in two or more independent alleles due to similar selection pressures. Yet, to produce shared motifs found in different MHC alleles, an enormous number of random substitutions is required. This, is turn, requires an extraordinarily high mutation rate, or a very long time period. Several authors have shown that the mutation rate is not higher at MHC loci than at other genes (e.g., Satta et al. 1993). Yeager and Hughes (1999) estimated that for two unrelated ancestral sequences sharing an amino acid sequence motif of three residues, time intervals >20 MYA are required. Clearly invoking convergence rather than evolution by recombination/gene conversion view requires many more unrealistic assumptions.
MHC sequence evolution including longer stretches of noncoding gene segments has been studied in few other natural populations, and we are not aware of any study in fish. This is clearly a shortcoming, given that an analysis of paralogous and orthologous relationships among gene families such as MHC should rather be undertaken using noncoding segments of the gene (Bergström et al. 1998; Hughes 2000; Elsner et al. 2002). Phylogenetically supported clusters of exon 2 sequences, encompassing the peptide-binding region, have often been interpreted as evidence against a prominent role of recombination in shaping MHC variation (e.g., Kupfermann et al. 1992; Figueroa et al. 2000; Kriener et al. 2000; Sato et al. 2001). However, under intra- and intergenic recombination, the detection of phylogenetic branches under a point-mutation model is flawed and, therefore, cannot be used for identifying the homologous relationships among gene loci.
The evolution of MHC genes in sticklebacks, and possibly other fish species (Stet et al. 2003), may be remarkably different from that in well-studied mammalian species. In fact, two main features make MHC evolution in three-spined sticklebacks more similar to that found in birds than to that in mammals. First, the absence of locus-specific clustering of exon 2 and intron 2 and, second, the large variations in haplotypic gene number (Edwards et al. 1995; Hess and Edwards 2002; Westerdahl et al. 2000) markedly contrast with well-known mammalian examples. Two nonexclusive evolutionary models of MHC evolution may explain the lack of interlineage allelic divergence, the “recent duplication model” and the “gene conversion model” (Hess and Edwards 2002). The gene conversion model of evolution of the stickleback MHC loci is supported by differences in nucleotide diversity between transcribed and nontranscribed portions of the MHC class IIB genes, and by the frequency of recombination between more distantly related sequence variants. This does not rule out that, in addition to interlocus recombination, gene duplications occurred recently in stickleback. Duplications of MHC genes are more frequent in fish and bird species, as is also suggested by large interhaplotype differences in duplication number in several species (Malaga-Trillo et al. 1998; Hess and Edwards 2002).
Theoretical models predict that interlocus gene conversion can enhance the divergence between alleles (Ohta 1999). These expectations are consistent with the high mean nonsynonymous sequence polymorphism at peptide-binding residues up to dN ∼ 0.4 in our data. These values are in the range of the highest values for vertebrate populations, for example, the most polymorphic mammalian species, deer mouse (Richman et al. 2001). Interlocus recombination identified here among MHC class IIB loci may thus be a mechanism that is particularly important in light of the “divergent allele advantage,” i.e., when heterozygous individuals with more distantly related alleles have a fitness advantage over those with different but more similar alleles (Wakeland et al. 1990; Richman et al. 2003).
We thank H. Schaschl for many valuable comments on the manuscript, S. Carstensen, S. Liedtke, N. Ryk, C. Schmuck, and T. Sonntag for laboratory assistance, and M. Milinski for ecouragement and support. Many thanks go to T. Reimchen for providing stickleback samples from British Columbia. TBHR thanks W. T. Stam for initially introducing him to indel alignment. TBHR was supported by Deutsche Forschungsgemeinschaft (DFG Re 1108/4 and -5). AL received a fellowship from the Swedish Research Council.
- Binz T, Reusch TBH, Wedekind C, Milinski M (2001) SSCP analysis of Mhc class IIB genes in the threespine stickleback. J Fish Biol 58:887–890Google Scholar
- Hess CM, Edwards SV (2002) The evolution of the major histocompatibility complex in birds. Biosience 52:423–431Google Scholar
- Hughes AL (1999) Adaptive evolution of genes and genomes. Oxford University Press, New YorkGoogle Scholar
- Klein J (1986) Natural history of the major histocompatibility complex. Wiley & Sons, New YorkGoogle Scholar
- Nei M, Hughes AL (1992) Balanced polymorphism and evolution by the birth-and-death process in the MHC loci. In: Tsuji K, Aizawa M, Sasazuki T (eds) 11th Histocompatibility Workshop and Conference. Oxford University Press, New York, pp 27–38Google Scholar
- Orti G, Bell MA, Reimchen TE, Meyer A (1994) Global survey of mitochondrial DNA sequences in the threespine stickleback: evidence for recent migrations. Evolution 48:608–622Google Scholar
- Richman AD, Herrera LG, Nash D, Schierup MH (2003) Relative roles of mutation and recombination in generating allelic polymorphism at an MHC class II locus in Peromyscus maniculatus. Genet Res Cambridge 82:89–99Google Scholar
- Sawyer, SA (1999) Geneconv: a computer package for statistical detection of gene conversion. Code available at http://www.math.wustl.edu/∼sawyer