Origin and evolution of the Ig-like domains present in mammalian leukocyte receptors: insights from chicken, frog, and fish homologues
- First Online:
- Cite this article as:
- Nikolaidis, N., Klein, J. & Nei, M. Immunogenetics (2005) 57: 151. doi:10.1007/s00251-004-0764-0
In mammals many natural killer (NK) cell receptors, encoded by the leukocyte receptor complex (LRC), regulate the cytotoxic activity of NK cells and provide protection against virus-infected and tumor cells. To investigate the origin of the Ig-like domains encoded by the LRC genes, a subset of C2-type Ig-like domain sequences was compiled from mammals, birds, amphibians, and fish. Phylogenetic analysis of these sequences generated seven monophyletic groups in mammals (MI, MII, and FcI, FcIIa, FcIIb, FcIII, FcIV), two in chicken (CI, CII), four in frog (FI–FIV), and five in zebrafish (ZI–ZV). The analysis of the major groups supported the following order of divergence: ZI [or a common ancestor of ZI and F (a cluster composed of the FcIII and FIII groups)], F, CII (or a common ancestor of CII and MII), MII, and MI–CI. The relationships of the remaining groups were unclear, since the phylogenetic positions of these groups were not supported by high bootstrap values. Two main conclusions can be drawn from this analysis. First, the two groups of mammalian LRC sequences must diverged before the separation of the avian and mammalian lineages. Second, the mammalian LRC sequences are most closely related to the Fc receptor sequences and these two groups diverged before the separation of birds and mammals.
KeywordsLeukocyte receptor complexChicken Ig-like receptorsFc receptorsIg-like domain groups
The mammalian natural killer (NK) cell receptors fall into two categories, one category belonging to the immunoglobulin superfamily (IgSF) and the other to the C-type lectin superfamily. The Ig-like receptors occupy a genomic region called the leukocyte receptor complex (LRC; Trowsdale et al. 2001). In mammals, the LRC contains several gene families including the killer cell Ig-like receptors (KIR), the leukocyte Ig-like receptors (LILR), and the paired Ig-like receptors (PIR), which form species-specific evolutionary clusters (Martin et al. 2002). Singleton genes have been identified in the LRC of humans, artiodactyls, and rodents (Hoelsbrekken et al. 2003; Maruoka et al. 2004; Morton et al. 2004). The presence of the LRC in all mammals so far studied suggests that this region formed before the mammalian radiation. Evolution of the LRC genes has thus far been studied mainly in mammalian species for which gene sequences have been available (Volz et al. 2001; Hughes 2002; Martin et al. 2002). Here we investigate the origin of Ig-like domains encoded by the mammalian LRC genes using recently generated homologous sequences from chicken, frog, and fish.
The annotated human and mouse LRC genomic sequences [from the National Center for Biotechnology Information (NCBI) builds 34 and 32, respectively] and the chicken Ig-like receptors (CHIR)-A and CHIR-B (Dennis et al. 2000) were used in tBLASTn (Basic Local Alignment Search Tool) searches (Altschul et al. 1990). The databases searched were as follows: chicken (Gallus gallus), the whole genome shotgun (WGS as of 04/04) database of NCBI and the ENSEMBL database (v22.1.1) (Washington University Genome Sequencing Center); frog (Xenopus tropicalis), the preliminary assembly (v1.0) of the genome (Department of Energy Joint Genome Institute); zebrafish (Danio rerio), the WGS (05/04) database of NCBI (Sanger Genomic Institute). BLAST searching was conducted by using high expected E-values (E=10) to ensure that most sequences homologous to the mammalian LRC genes would be retrieved. The resulting BLAST hits were sorted according to the E-value and the coverage of the query sequence. Only hits covering at least 85 amino acid residues (the minimum length of the C2-type Ig-like domain; Klein and Hořejší 1997) were used for further analysis. Gene structures were predicted by using GenomeScan (Yeh et al. 2001). In the case of chicken, several genes annotated in the ENSEMBL database, as well as additional sequences identified by us were used. Because of the incomplete status of the chicken and the frog genome projects and the fact that most of the genes identified reside in small non-overlapping contigs, the precise gene structure and the genomic organization of the chicken and frog genes are not known.
Ig-like domains were identified by using the SMART (Letunic et al. 2004) and PFAM databases (Bateman et al. 2004). To determine whether the sequences were of the C2-type, characteristic of the LRC genes (Martin et al. 2002), they were subjected to structure-based alignment analysis using C2-type domains with resolved tertiary structure as profiles. The Ig-like domain sequences were aligned using the profile alignment option of ClustalX v.1.81 (Thompson et al. 1997). Nucleotide alignments were obtained by correspondence to the amino acid alignments. Phylogenetic trees were constructed by using the neighbor-joining (NJ) method with the p-distance (proportion of differences; MEGA v2.1; Kumar et al. 2001). The p-distance is known to give generally a higher resolution of branching patterns because of the smaller standard errors (Nei and Kumar 2000). We also constructed parsimony trees, but since they were essentially the same as the NJ trees with respect to the major branching patterns, they are not shown.
The extent of divergence of amino acid sequences was found to be higher than that of the nucleotide sequences in most pairwise comparisons (Suppl. Table 1). For example, in the sequences of the CI group the average amino acid identity was 57%, whereas the average nucleotide identity was 75%. One possible explanation of this difference could be positive selection acting on these sequences (as has been proposed for the KIR genes; Hughes 2002; Hao and Nei 2004). To test this hypothesis, we calculated the number of non-synonymous (pN) and synonymous (pS) substitutions per site using the original version of the Nei-Gojobori method (as implemented in MEGA2; Kumar et al. 2001). The analysis showed that in most comparisons the pN/pS ratio was less than 1 (mean value being 0.8505) and the Z-test suggested that the null hypothesis of neutrality could not be rejected at the 5% significance level (Nei and Kumar 2000). Thus, even if positive selection were restricted to specific sites of these sequences [as has been shown for the peptide binding sites of major histocompatibility complex (Mhc) class I molecules; Hughes and Nei 1988], it would not explain the overall high divergence of the amino acid sequences. Although the phylogenetic trees based on amino acid and nucleotide sequences generated topologies with similar branching patterns with respect to the major clades, the bootstrap values of the former trees were lower than those of the latter because of the high degree of protein sequence divergence. For this reason, only the trees based on nucleotide sequences are presented here.
We suggest a speculative evolutionary scenario for the evolution of the Ig-like domains that have been identified (Suppl Fig. 4). According to this scenario the ZI group shared a common ancestor with the ancestor of the F and CM clusters (F-CM) that existed before the divergence of the fish and tetrapod lineages (Z-F-CM). The divergence of the F and CM clusters from their common ancestor probably occurred after the fish-tetrapod split but before the bird-mammalian split.
Analysis of how the Ig-like domains of single mammalian, avian, amphibian, and fish proteins are distributed among the phylogenetic groups has revealed a number of interesting associations. First, the mammalian group I (MI) contains the first domains (D1 in Fig. 3b) of all the LRC-encoded proteins, except for the KIRs, the third domains of PIRs (D3), and the two domains of the osteoclast-associated receptors (OSCAR). MII contains all three domains of the KIRs, as well as the remaining LRC-encoded domains (Fig. 3 and Suppl. Fig. 5). This result indicates that within each of these groups, there are Ig-like domains belonging to proteins with divergent functions, which have a recent common ancestor (see also Martin et al. 2002). Second, the chicken CI group contains the D1 domains, and the CII group contains the D2 domains of the CHIR proteins (Fig. 3). Third, the mammalian FcI group contains the D1 domains (see Fig. 2b), FcII contains the D2, FcIII the D3, and FcIV the remaining domains (Fig. 2). This result indicates that within each group, there are domains belonging to proteins with distinct functions [the FcG and the IRTA (FcH); Davis et al. 2001; Miller et al. 2002], which share a common ancestor. Fourth, the frog group I (FI) contains mainly the D1 domains, FII the D2, and FIII the remaining domains, while FIV contains two domains encoded by a single gene (Fig. 2). Fifth, the zebrafish group I (ZI) can be divided into two subgroups (ZIA and ZIB in Suppl. Fig. 2). The ZIA subgroup contains the D1 and D2 domains, and the ZIB contains the D3 and D4 domains. The remaining domains form groups ZII–ZV (Suppl. Fig. 2). Although the clustering of the zebrafish groups is supported by low bootstrap values, their origin from a common ancestor is supported by the fact that the genes encoding these domains are located in tandem in a single genomic region (data not shown).
In conclusion, our results show that the Ig-like domains of the mammalian LRC and Fc receptors are present also in other non-mammalian vertebrates. According to our analyses, the two groups of LRC Ig-like domain sequences probably diverged before the separation of the mammalian and avian lineages. In addition, our data suggest that the LRC sequences are most closely related to Fc receptor sequences and that the divergence between these two groups occurred before the separation of birds and mammals.
The Washington University Genome Sequencing Center, the DoE Joint Genome Institute and the Sanger Genomic Institute are acknowledged for making the chicken, frog and zebrafish sequences available. We thank the three anonymous reviewers for their comments and suggestions. We also thank Wojciech Makalowski and Dimitra Chalkia for valuable comments and discussions. This work was supported by a grant from the National Institutes of Health (GM20293) to M.N.