Background

The Major Histocompatibility Complex (MHC) plays a central role in the immune system of all jawed vertebrates. It is the most polymorphic genomic region identified, and encodes proteins involved in the innate and adaptive immune responses[1, 2]. Particularly, the MHC Class I and Class II genes encode proteins that bind to and carry small antigen peptides to the cell surface thus presenting them to cytotoxic T cells or helper T cells. This in turn triggers the downstream immune cascade. Therefore, this genomic region is crucial for the organism’s resistance and susceptibility to pathogenic disease[2].

Despite its functional consistency, the MHC genomic cluster has different gene organization patterns across different organisms. The latest genomic map of the human MHC (HLA) spans about 7.6 Mb and contains 421 gene loci on a contiguous region on chromosome 6[3], whereas the MHC regions of other organisms generally have a different gene order and size, or are even scattered on separate chromosomes[46]. Notably, the chicken (Gallus gallus) has two genetically independent MHC clusters, the MHC-B and MHC-Y (previously Rfp-Y). Both are located on microchromosome 16 (GGA16)[711]. There has been some evidence for the gene expression and function for disease susceptibility of the MHC-Y region, but it is the MHC-B that is believed to be the main functional MHC genomic region of chicken[1215]. The highly streamlined MHC-B, which includes genes encoding Class I and Class IIB molecules, contains only 19 genes and is about 92Kb in length[1416]. Sequencing efforts have also been made on other bird species, such as mallard duck, red-winged blackbird, house finch and zebra finch[1721]. However, none of these species seem to share the characteristics of the minimal essential chicken MHC.

The chicken and other fowl species belong to the order Galliformes. Available MHC maps of other galliform birds generally show the same compact feature of this genomic region as that of chicken. For example, the MHC-B of the turkey (Meleagris gallopavo) has a good synteny with the chicken MHC-B, the only exceptions being that turkey MHC-B has more BG and BLB (MHC Class IIB) gene copies and an inversion of the TAPBP gene[22]. The quail (Coturnix japonica) MHC-B includes an expanded number of duplicated genes and the numbers of the duplicated loci also vary to some extent among individuals[23, 24]. The MHC-B of the golden pheasant (Chrysolophus pictus) also shows a good synteny with chicken, but has two inversions of TAPBP and TAP1-TAP2[25].

Black grouse (Tetrao tetrix) is a wild galliform bird species that has been well-studied from an ecological perspective, including conservation genetics, behavioural ecology, sexual selection and the evolution of the lek mating system[2628]. Previous work on the black grouse MHC identified the MHC-B and MHC-Y genomic loci, and the polymorphism of the second exon of the MHC Class IIB gene has been surveyed at the population level[2931]. In this paper, we investigate the detailed genomic organization of the black grouse MHC-B region. We constructed a fosmid library to sequence the MHC-B genomic cluster and used Roche 454-transcriptome sequencing (RNA-Seq) to verify the expression of the identified genes[32]. The results allow us to conduct a comprehensive comparative genomics analysis of the galliform MHC region. Due to a previous lack of genomic data on avian MHC regions this kind of analysis has not previously been feasible. The black grouse MHC sequence, together with four other completely characterized galliform MHC regions, thus offer a unique opportunity in bird MHC studies.

Results

Sequence of the black grouse MHC-B region

Four overlapping MHC-bearing fosmid clones with lengths of 29,972 bp - 40,168 bp were identified and sequenced (Figure1A). They were aligned into a consensus sequence of 88,390 bp (GenBank accession number JQ028669). This sequence covers the majority of the black grouse MHC-B region (including the complete “core” MHC region), from the BTN1 gene to the CYP21 gene. Since the sequenced black grouse we used was a wild and not inbred animal, we found clones from both homologous chromosomes. More specifically, P2D1 was found to be from a different chromosome than the other three clones (Figure1A). To maximize the possibility of obtaining a real complete haplotype of the black grouse MHC, we used the combined sequences of P3B2 and P5B8 for the consensus sequence for the heterozygous parts. Therefore, our black grouse MHC sequence was for the most part a real haplotype, apart from the small gap (1,872 bp) between P3B2 and P5B8 which was only covered by P2D1. Sequencing both homologous chromosomes provided us the opportunity to identify polymorphisms in the heterozygous parts. From the heterozygous overlap (25,345 bp) of P3B2 and P2D1, we found 275 single nucleotide polymorphisms (SNPs) and 31 deletion-insertion polymorphisms (DIPs). From the much smaller overlap (2,693 bp) of the P2D1 and P5B8, we found 3 SNPs and 2 DIPs ( Additional file1).

Figure 1
figure 1

Sequence features of the black grouse MHC-B region. A. Position of the sequenced fosmid clones. Dotted lines indicate the heterozygous parts. B. Gene annotation of the MHC-B of black grouse. Different shadows indicate different MHC gene families defined from human MHC. From dark to light: Class I, Class II, Class III, others. C. Average 454 sequencing coverage per nucleotide for each expressed region. D. Positions of repetitive elements and tRNAs. E. CpG islands in 100 bp window size. F. GC contents in 200 bp window size.

Five chicken repeats (CR) were identified, of which CR1-F and CR1-X1 were also found to match the chicken MHC-B. We also found 14 simple sequence repeats (SSRs, microsatellites) in the black grouse MHC-B region (Figure1D, Additional file2). The average GC content of the black grouse MHC-B region is 59.0%, which is as high as that of the chicken (55.5%) (Figure1 F). This is probably because the region we sequenced lay on the gene intensive BF/BLB region, which had a higher GC content than the other regions. Also, the black grouse MHC has a high density of CpG islands (Figure1E), which may indicate the functional importance of this region[33].

Gene identification and verification

All the three gene prediction programs used could identify most of the genes located on black grouse MHC-B, and most of the chicken, turkey and golden pheasant MHC genes could be well aligned with their homologous genes on black grouse MHC-B. Therefore, 18 genes including BTN1 (partial), BTN2, Blec2, Blec1, BLB1, TAPBP, BLB2, BRD2, DMA, DMB1, DMB2, BF1, TAP2, TAP1, BF2, C4, CenpA, CYP21 (partial) were confirmed at least by three of the above approaches (Table1). The only exception was the gene BG1: Fgenesh and Genscan did not identify this gene and the comparison with chicken and turkey gave inconsistent results. Therefore, the annotation of this gene is only based on the result from the GeneMark prediction and was checked manually.

Table 1 Features of the coding sequences of black grouse MHC-B genes and sequence comparisons with homologous genes in chicken, turkey, quail and pheasant

From our RNA-Seq data, 480 reads could be mapped onto 17 predicted genes in the black grouse MHC-B region, with an average mapped contig length of 209.4 bp. That is, 17 out of the 19 predicted genes (all except BTN2 and CenpA) had concrete evidence of gene expression (Figure1C). The gene expression levels of the verified genes were variable. For example, BTN1, DMB2 and TAPBP were highly expressed, with mean sequence coverage per nucleotide of 34.6, 23.0 and 21.5, respectively (Table1). The MHC Class I and Class IIB also had high levels of gene expression. The sequencing coverage per nucleotide of BF2, BLB1 and BLB2 were 16.1, 18.1 and 12.2 respectively. In contrast, the genes BG1, Blec1, DMB1, TAP1 and CYP21 only had one single transcript read mapped each. Within genes, there was a strong 3- prime (including the un-translated region) bias of the number of the transcripts mapped; this is likely due to the technical nature of the cDNA library preparation[34]. The absence of the verification of some exons may also be an artefact of the library preparation, limited sequencing depth or data analysis strategy, and does not necessarily mean that the exons are not expressed[32].

Comparative genomics of the galliform MHC-B

The black grouse MHC-B genomic region shares an almost perfect synteny with that of chicken, the gene numbers and gene orders of the two species are identical (Figure2). Compared to the turkey MHC-B, black grouse MHC-B has less BG genes and less BLB genes, but the MHCs of the two species are still highly similar. The golden pheasant MHC-B also has more BLB genes than that of black grouse (Figure3). The quail MHC-B has significant expansions of BLB genes and BF genes, and has some pseudogenes scattered in this region, but the black grouse MHC-B is still in an obvious synteny with it.

Figure 2
figure 2

Identity matrix plotting of the nucleotide sequences of MHC-B region of black grouse itself (left) and between black grouse and chicken (right). Different shading of genes indicate different MHC gene families defined from human MHC. From dark to light: Class I, Class II, Class III, others.

Figure 3
figure 3

Phylogenetic relationship and structural comparison of the MHC-B regions of black grouse, chicken, turkey, quail and golden pheasant. The phylogenetic tree is constructed with the Neighbor-joining method. Numbers next to the branch points indicate the bootstrap values as percentages of 1000 replicates. Pseudogenes of the quail MHC-B are not shown. Arrows and dotted lines highlight inversions and duplications. Numbers beside the arrows indicate the positions of the breakpoints on the compared sequences. Accession numbers: black grouse (JQ028669), chicken (AB268588), turkey (DQ993255), quail (AB078884), golden pheasant (JQ440366). Different shading of genes indicate different MHC gene families defined from human MHC. From dark to light: Class I, Class II, Class III, others.

The most remarkable features of the galliform MHC-B is the gene orientation of TAPBP, TAP1 and TAP2. The black grouse MHC-B has inversed TAPBP and TAP1-TAP2 blocks compared to the chicken, while only the TAP1-TAP2 block is inversed compared to the turkey. The golden pheasant shares the same gene orientation of TAPBP and TAP1-TAP2 block with black grouse, where the gene orientation of these gene/gene blocks for quail is the same as that of chicken (Figure3).

Looking at the genes separately, we found that most of them were very similar in terms of nucleotide and amino acid sequence between the five galliform species (Table1). However, the phylogenetic relationships of these genes are not consistent. The phylogenetic tree constructed using the entire MHC-B sequences of the five species (Figure3) follows the neutral expectation[35]. The phylogeny of the coding sequences of TAPBP, BRD2, DMA, DMB1, BF1 and TAP2 share the same tree topology with the tree constructed using the entire MHC-B, whereas the phylogenetic trees for the coding sequences of Blec1, BLB1, BLB2, DMB2, TAP1 and BF2 show different tree topologies within the clade of black grouse, turkey and golden pheasant (Figure4). Interestingly, genes with aberrant phylogenetic relationships (with grouse or turkey basal to the other two species) showed signs of having elevated dN/dS ratios compared to genes following the phylogenetically neutral expectation (Figure5). This could be interpreted as an indication of increased balancing selection or relaxed purifying selection acting on these genes.

Figure 4
figure 4

Phylogenetic relationships of the coding sequences of the homologous genes in black grouse, chicken, turkey, quail and golden pheasant. The phylogenetic trees are constructed with the Neighbor-joining method. Numbers next to the branch points indicate the bootstrap values as percentages of 1000 replicates. The stars indicate the tree topology is the same as that of neutral makers.

Figure 5
figure 5

Plotting of d N /d S values of MHC genes grouped by phylogenetic tree topology. One group includes the genes following the expected tree topology as neutral markers: TAPBP, BRD2, DMA, DMB1, BF1 and TAP2; the other includes the genes showing aberrant tree topology as neutral markers: Blec1, BLB1, BLB2, DMB2, TAP1 and BF2.

Discussion

We have sequenced, annotated and analysed the MHC-B gene cluster of the black grouse. Black grouse is a wild bird species and represents the lineage Tetraoninae in the Galliformes[36]. With the availability of its MHC sequence and several other fully sequenced galliform MHC we now, for the first time, have the opportunity to perform a comparative genomic study of avian MHC. The MHC-B gene cluster of black grouse is just as simple and streamlined as that of chicken[15] (Figure3). By contrast, the quail MHC-B has more duplicated genes and pseudogenes (10 BLB, 7 BF and 8 BG loci) compared to black grouse[23] (Figure3). The turkey MHC-B and the golden pheasant MHC-B, which are phylogenetically closer to black grouse than chicken and quail, also have expanded BLB genes[22, 25] (Figure3). Our results provide additional evidence that the extremely compact nature of the chicken MHC is not merely an artefact of domestication, since we find a similar pattern in a wild related species that is fully outbred.

The nucleotide identity of the black grouse MHC-B shows high similarity with that of other galliform birds (Table1). However, individual MHC genes might have different evolutionary histories. The phylogenetic tree based on the entire MHC-B sequence shows exactly the same topology as neutral markers[35] (Figure3). But when we used the coding sequences of each gene independently, only TAPBP, BRD2, DMA, DMB1, BF1 and TAP2 share the same tree topology with neutral genes (Figure4). Interestingly, for the genes Blec1, DMB2, TAP1 and BF2, the black grouse is more divergent than turkey and pheasant, while for the two BLB genes (BLB1 and BLB2), black grouse is closer to pheasant than turkey (Figure4, Additional file3). If we use the dN/dS values to estimate the selection pressure on the genes, we find that the genes following the neutral phylogenetic expectation generally have lower dN/dS values than genes with aberrant tree topologies (Figure5). Taken together the deviation from neutral phylogenetic patterns and elevated dN/dS levels indicates that the molecular evolution of several of the genes in the galliform MHC region is affected by selective forces. Especially, the MHC class IIB genes (BLB1 and BLB2) show elevated levels of dN/dS. The peptide binding regions of these genes are classical examples of balancing selection[37]. An intriguing possibility is that the clustering of the grouse BLB and pheasant BLB might be due to specific selection in the wild since they were both sampled from natural populations, but this hypothesis needs further confirmation.

Another striking finding of the comparison of galliform MHC-B is the repeated inversions of the TAPBP gene and the TAP1-TAP2 block (Figure3). Using data from all available galliform MHC sequences, we found that the inversion of the TAPBP gene, located between the two MHC class IIB loci, seems to have happened once in the clade; either in the lineage leading to chicken and quail or in the lineage of pheasant, turkey and grouse, depending on the ancestral state. By contrast, the inversion of the TAP1-TAP2 gene block has occurred at least twice (depending on what the ancestral state is, which we cannot tell from our data) during the evolution of this clade. The TAP1-TAP2 block is flanked by the two Class I genes, BF1 and BF2. The events of gene conversion or interlocus recombination in the evolution of MHC genes have been reported before (reviewed in[38]). Here, our result could provide an indirect evidence for such events since if the gene conversion occurred repeatedly, the non-random breakpoints beside the two BF loci may lead to the inversion of the gene block TAP1-TAP2 between them. However, this needs to be further tested.

In this study, we constructed a fosmid library and used it to screen of the MHC genes. Fosmid libraries have been widely used in large genome projects such as gap closure of the human genome or metagenomics analysis[3941]. The success of our experiment demonstrates that the fosmid library is also suitable and convenient to sequence specific genome regions of a species whose genome map is unavailable. To verify the expression of the identified MHC genes, we mapped the transcriptome data of a 454 sequencing project to the MHC region. This allows us to efficiently confirm the expression of 17 identified genes. However, due to the limited 454 sequencing depth, it was not possible to cover all the 19 putatively expressed genes. Moreover, not all exons were verified in the expressed genes. This could be because of limited sequencing coverage, alternative splicing or artefacts from the mapping method to the short exons[4244].

Conclusions

We conclude that there is large synteny between the MHC-B region of the black grouse and that of other galliform birds. Some large scale changes like gene duplications and genomic rearrangements have, however, occurred within the galliform lineage. Some of the genes in the region also seem to have been affected by selective forces within this clade, as inferred from deviating phylogenetic signals and elevated rates of non-synonymous substitutions. The MHC-B sequence of the black grouse reported here will provide a very valuable resource for future studies on the evolution of the avian MHC genes and on immunogenetics and ecology in black grouse.

Methods

Genomic sequencing

The genomic DNA used for the sequencing of the MHC cluster in black grouse was extracted from a male bird shot near Östersund, Sweden in November 2009. Muscle tissue was immediately stored in 70% ethanol, -20°C until use. DNA extraction followed the high molecular weight (HMW) protocol described by Blin et al.[45]. The fosmid library was constructed using the Copy Control Fosmid Library Production Kit according to the manufacturer's protocol (Epicentre biotechnology, WI, USA). DNA was first separated by pulsed field gel electrophoresis (PFGE) and 30–39 kb fragments were excised, purified, blunt-ended and ligated into the pCC1FOS fosmid vectors included in the kit. Ligated DNA mixture was then packaged using the supplied lambda packaging extracts and transformed into EPI300-T1 phage E. coli hosts. In total the fosmid library consists of approximately 150,000 clones spread over clone pools in twenty 96-well plates.

Screening of the library was performed by a modified PCR-based clone pool method[46]. Nine pairs of PCR primers were used to screen and pinpoint the MHC-bearing clones ( Additional file4). One of the primer pairs was developed in a previous study of black grouse MHC BLB exon 2[29], while the others were developed from highly conserved gene regions between Chicken and Turkey. Four overlapping fosmid clones covering the core MHC Class I and Class IIB genes were selected to be sequenced. Shotgun subcloning and Sanger-sequencing of the fosmid clones were performed at 8X coverage by Macrogen (Macrogen Inc., Seoul, Korea). A primer-walking method was used to fill the shotgun sequencing gaps.

The sequencing reads were vector-trimmed, quality-checked and assembled using CAP3[47]. The assembled fosmid clones were aligned into one consensus sequence using the ClustalW program implemented in CodonCode Aligner 2.06 (CodonCode Corporation, MA, USA)[48]. For the heterozygous parts of overlapping clones, we used the sequences from P3B2 and P5B8 as the consensus sequence (Figure1A). We also followed a genomic-alignment strategy to detect the putative single nucleotide polymorphisms (SNPs) in the heterozygous parts[49, 50]. Alignment of the genomic sequences of the fosmid clones and manual identification of SNPs were conducted using the ClustalW program in CodonCode Aligner 2.06.

Gene identification

Identification of coding regions and putative exons was conducted by three different gene prediction programs: Fgenesh (http://www.softberry.com), GeneMark.hmm (http://exon.gatech.edu) and Genscan (http://genes.mit.edu/GENSCAN.html)[5153]. In the Fgenesh and GeneMark.hmm algorithms, the organism-specific parameters were all set as in the chicken; in Genscan, the parameters were set as vertebrate. In addition to the automatic gene identification, we also extracted individual gene sequences from the chicken MHC (GenBank accession number: AB268588 and AL023516), turkey MHC (GenBank accession number: DQ993255) and golden pheasant MHC (GenBank accession number: JQ440366), and used the ClustalW program in CodonCode Aligner to align them with the black grouse sequence to identify the gene positions. Finally, we manually curated the genes by comparing the results from all above approaches, as well as the RNA-Seq mapping result described below. Repeat elements were identified using Repeatmasker (http://www.repeatmasker.org), and tRNAs were identified using tRNAScan[54]. The identification of CpG islands and the plotting of GC contents were performed using the EMBOSS software suite[55].

Transcriptome sequencing and gene verification

RNA-Seq data from a 454-transcriptome sequencing project was used to verify expression of the MHC genes (GenBank short read archive number SRA036234)[56]. This data was generated from a male individual collected near Uppsala, Sweden in 2008. Spleen tissue, where many immune-related genes are likely to be expressed, was used to construct the cDNA library. The 454-sequencing was conducted in two partial runs of the GS FLX sequencing instrument (Roche) with Titanium XL reagents and 70x75 mm PicoTiterPlates (PTP). In total 182,179 quality-filtered sequencing reads with average length of 321 ± 141 bp were used for mapping. We used the program gsMapper in Newbler 2.5.3 (Roche/454 Life Sciences) to map the 454-reads to the assembled black grouse MHC consensus sequence. To make sure the mapped reads did not originate from MHC-like paralogues in other genomic regions, we blasted the mapped reads to the entire chicken genome. Reads with a best hit outside the MHC region were excluded in further analysis.

Comparative genomics analysis

The identity dot matrixes of the black grouse MHC-B sequence and the chicken MHC-B sequence (GenBank accession number: AB268588) were generated using PipMaker[57]. The alignment of the entire MHC-B regions of the five galliform species was performed using the ClustalW program in CodonCode Aligner and the program Mauve 2.3.1[58] and checked manually. The GenBank accession numbers of the downloaded sequences are AB268588 (chicken), DQ993255 (turkey), JQ440366 (golden pheasant) and AB078884 (quail). The molecular evolution model of the sequences was estimated by jModelTest[59] and the phylogenetic tree was constructed using the neighbor-joining method in MEGA 5.05[60]. A bootstrap of 1000 replicates was used to verify the creditability of the tree.

The coding sequences of the individual MHC genes were extracted directly from the GenBank entries of the above listed sequences by the GenBank online tools. For the quail, the BF genes beside TAP1-TAP2 block were used as BF1 and BF2 respectively; the BLB genes beside TAPBP gene were used as BLB1 and BLB2 respectively. The alignments of the coding sequences were also conducted using ClustalW in CodonCode Aligner. The phylogenetic trees were constructed following the same protocol as the entire MHC-B tree. The outgroup sequences used to construct phylogenetic trees for pooled BF and pooled BLB genes (in additional file3) were DQ251182 (domestic goose, Anser anser) and DQ490139 (mallard, Anas platyrhynchos) respectively. To estimate the molecular selection forces, the rates of nonsynonymous to synonymous (dN/dS) were calculated using Nei-Gojobori method in the program PAML 4.6[61, 62]. All the pairwise dN/dS values between the five galliform species were summarised to calculate the average dN/dS value for the gene.