Introduction

In 1991, Yokoyama and co-workers showed that genes encoding the C-type lectin-like (CLEC) receptors NKR-P1 and Ly49 were genetically closely linked in the mouse and postulated the existence of a genetic region encoding functionally related C-type lectin superfamily (CLSF) members. As these receptors primarily were expressed by natural killer (NK) cells, it was named the NK cell gene complex (NKC) (Yokoyama et al. 1991). In the pursuit of identifying structural gene(s) associated with regulation of NK cell mediated alloreactivity in the rat, we subsequently identified the equivalent rat gene complex (Dissen et al. 1996). Remarkably, both in the mouse and the rat, the genetic distance between the Nkrp1a and the Ly49 genes was estimated to ∼0.5 cM, based on analyses of recombinant inbred strains (Yokoyama et al. 1991) and parental to F1 backcrosses (Dissen et al. 1996), respectively. Assuming average crossing-over frequencies, this would correspond to a physical distance of less than 1 Mb. However, the first physical map of the NKC, based on pulsed field gel electrophoresis, indicated the region to be much larger (Dissen et al. 1996). In the rat, other CLSF genes were rapidly mapped to the intervening region (Berg et al. 1998a, b; Dissen et al. 1997), followed by still more genes in the mouse and, with the identification of a human NKC, in the human (Renedo et al. 1997; Suto et al. 1997; Hamann et al. 1997; Colonna et al. 2000; Sobanov et al. 2001; Boles et al. 1999; Plougastel et al. 2001). The releases of the sequences amassed by the international genome projects finally exposed the real size of this gene complex in different species. Thus, aided by the Rat Genome project, we could report that the genetic region spanning from the most centromeric Nkrp1 to the most telomeric Ly49 gene covered 3.3 Mb, with a predicted content of 67 CLSF genes (Nylenna et al. 2005a). However, although the conservation of sequence features among the many intraspecific paralogs and interspecific orthologs has greatly facilitated prediction of novel CLSF genes, in silico prediction is error prone and uninformative as to whether the predicted genes are expressed.

Functional traits such as resistance to cytomegalovirus (reviewed in Webb et al. 2002) and fungal infection (reviewed in Sun and Zhao 2007) and association with celiac disease have been mapped to discrete genes in the NKC (Hue et al. 2004), and association to experimentally induced arthritis was mapped to the neighboring gene complex, APLEC, also encoding CLSF receptors (Flornes et al. 2004; Lorentzen et al. 2007). For other quantitative trait loci mapped to this chromosomal region, including loci controlling experimentally induced inflammatory responses in the rat, the associated structural genes await identification. An accurate and complete inventory of the genetic content of this genetic region would represent a useful tool in their ultimate identification, and thus of importance for studying pathogenetic mechanisms behind the human inflammatory and autoimmune diseases. We have previously described the cDNA cloning of the genes constituting the Klre1Klri2 cluster (Berg et al. 1998a, b; Dissen et al. 1997; Saether et al. 2005; Westgaard et al. 2003) and, barring pseudogenes, most of the genes belonging to the Ly49 gene cluster (Naper et al. 2002; Nylenna et al. 2005b). Together, the two clusters make up the telomeric part of the rat NKC. From the centromeric end of the complex, we recently reported the cDNA cloning of the Nkrp1 genes and most of the Clr genes (Kveberg et al. 2009), leaving a gap between the most distal Clr gene and the Lox1 (Olr1) gene, previously reported by others (Nagase et al. 1998) and situated immediately centromeric to the Klre1Klri2 cluster. Here, we close this gap by presenting the cDNA cloning and transcription patterns of eight genes spanning from Cd69 to Dectin1, including a novel gene which we have called Clec2m. We also include transcription analyses of the Nkrp1 and Clr genes and present the genomic organization of the 67 predicted group V CLSF localized in the rat NKC.

Materials and methods

cDNA cloning

As the genes reported here were cDNA-cloned before the availability of the rat genome sequence assembly, they were identified as described in Flornes et al. (2004) by (1) searching the GenBank rat Trace archive and the EST database for sequences homologous to the published mouse and human receptors, using the NCBI BLAST program and (2) performing pairwise BLAST on recently released partially or fully sequenced rat BAC clones. Gene-specific (nested) primers in the 5′- and 3′-untranslated regions (UTRs) were generated from the predicted sequences (primers shown in Supplemental Table 1). The 3′-UTR of Clec12a could not be identified, and this gene was therefore cDNA-cloned using RACE with nested 3′ primers from the GeneRacer™ kit (Invitrogen.com).

mRNA from mouse and rat lymph node, bone marrow, or testis were isolated with Dynabeads mRNA Direct kit (Invitrogen), and first-strand cDNA synthesis was carried out with M-MLV reverse transcriptase RNase H (Promega) using 3–5 μl of the eluate in a 20-μl reaction volume. PCR was performed on first-strand cDNA using gene-specific primers and PfuTurbo DNA polymerase (Stratagene), and the products were cloned into pCR2.1-TOPO vector (Invitrogen). For every gene, three or more independent clones were sequenced. The sequences were analyzed using software supplied by the Norwegian EMBNet node at the Biotechnology Centre in Oslo, Norway.

Quantitative PCR

A panel of RNA from purified cells and tissues from PVG rats was prepared previously (Dissen et al. 1997). Briefly, cells and tissues were extracted by lysis in isothiocyanate (Gibco BRL) followed by ultracentrifugation on a cesium chloride (Gibco BRL) gradient and phenol/chloroform extractions. Dendritic cells were isolated by spontaneous migration from split ear-halves floating on RPMI for 48 h, testis were dissected from rats at day 4 and weeks 7.5, and 15. RNA from dendritic cells and testis were isolated with RNeasy (Qiagen), and first-strand cDNA synthesis for all samples was performed with Superscript II reverse transcriptase (Invitrogen) and random hexamers using 1 µg total RNA in a 30-µl reaction. Amplification was performed on an Applied Biosystem 7900HT Fast Real Time PCR system (PerkinElmer) using the 5′nuclease assay and qPCR SuperMix with Rox (Invitrogen). Analyses were done with the SDS2.1 software (Applied Biosystems, PerkinElmer). Primers and probes were designed using PrimerExpress 3.0 (Applied Biosystems, PerkinElmer) and purchased from Eurofins MWG Operon (Ebersberg, Germany). Individual samples were run in triplicate, and the relative quantity of RNA for each target was normalized to RNA from the reference gene plasma membrane calcium ATPase 4 (PMCA4).

Bioinformatics

Sequence similarity searches were performed with BLAST programs, running on the NCBI or Ensembl websites. The phylograms were constructed with NJ plot based on alignments with the ClustalX program, with the pileup program in the GCG package (Accelrys Inc., San Diego, CA, USA) and Bayesian inference methods (MrBayes—http://mrbayes.csit.fsu.edu)

Results and discussion

Sequences and transcription patterns

Eight genes spanning the gap between the Nkrp1/Clr genes and the Lox1 locus were cDNA cloned. In Fig. 1, the predicted amino acid sequences are presented together with their mouse and human counterparts, with closest sequence similarity (Fig. 2) and conserved chromosomal positions and orientation between the three species as orthology criteria.

Fig. 1
figure 1figure 1

Amino acid sequence alignments of the CLSF receptors. r rat, m mouse, h human. Dashes mean identity with top sequence. Points indicate gaps. Ex1, Ex2, etc. on top of alignment show start site of corresponding exons. Predicted transmembrane regions (TMpred) are underlined. ITAMs and ITIMs are indicated. Potential N-glycosylation sites are boxed. Membrane external cysteines are highlighted in yellow. Numbers connected with black lines on top of the CD69 and Dectin1 sequences indicate SS-bonds as demonstrated for crystal structures of human CD69 and mouse Dectin1 (http://www.ncbi.nlm.nih.gov/ Genbank/). For the other genes, the corresponding cysteines are similarly numbered. X above cysteines in Clec1a indicates probable extra disulphide bridge, which in addition to the bridge between the cysteines labeled 2, links the first α helix with the last β strand of the lectin-like domain. GenBank accession numbers: rat CD69–GU357488, rat Clec2m–GU357487, mouse Clec2m–GU553093, rat Clec12a–GU357484, rat Clec12b–GU357483, rat Clec1b–GU357482, rat Clec9a–GU357486, rat Clec1a–GU357481, rat Dectin-1–GU357485. The rat strains from which the sequence was derived are shown in parentheses (PVG, DA, or both). It should be noted that the reported coding sequences were identical to the corresponding parts of the genomic rat sequence based on the BN strain. The single exception was rat Dectin1, with two single base substitutions in exon 6, one silent and one missense giving rise to an A225G substitution (identical for four independent cDNAs sequenced)

Fig. 2
figure 2

Dendrogram of the novel receptors (plus Lox1) and their human and mouse orthologs (based on amino acid sequences). Numbers indicate bootstrap values (1,000 reiterations) and Bayesian clade credibility values. Values shown only for branching points with bootstrap values >900 or credibility values >90

Cd69 and Clec2m

Whereas the majority of group V CLSF genes consist of six coding exons, Cd69 and Clec2m have only five (Fig. 1). Rat Cd69 shows relatively high transcription levels, as measured by RT-PCR, in dendritic cells derived from skin (sDC), peritoneal macrophages (pMΦ), and resting B and T cells as well as ConA activated lymphocytes (Fig. 3). The transcription in resting lymphocytes is noteworthy. Cd69 was first cloned in the human (Ziegler et al. 1993), where it originally was reported as expressed primarily on activated lymphocytes and on NK cells, neutrophils, and platelets. The CD69 receptor has consequently been widely used as an in vivo as a marker for leukocyte activation and inflammatory responses. It has been extensively studied in the human and the mouse, where in vitro studies originally indicated proinflammatory function. More recent in vivo results indicate that it may act as a regulatory molecule, modulating inflammatory responses (reviewed in Sancho et al. 2005) as well as playing a role in lymphocyte trafficking, by interacting with the S1P1 receptor required for efficient egress of lymphocytes from the thymus and lymph nodes (Shiow et al. 2006) and immature B cells from the bone marrow (Allende et al. 2010).

Fig. 3
figure 3

Gene transcription as measured by quantitative PCR. The transcriptions of genes were calculated as the ratio of copy number of target gene to the copy number of endogenous reference gene, PMCA4. Thresholds for intermediate and high transcription were set to 1/100 and 1 relative to PMCA4. Not detected no PCR product detected after 40 cycles, DC dendritic cells, PM peritoneal macrophages, RNK16 NK cell line, LAK IL2-stimulated NK cells. Similar results were obtained using the HPRT1 gene (hypoxanthine guanine phosphoribosyl transferase 1) as the endogenous reference gene

Clec2m has, to our knowledge, not previously been cDNA cloned. Hao et al. (2006) labeled the in silico predicted rat and mouse gene Clec15a and Clec15ap, respectively, with the mouse gene predicted to be a pseudogene. We have cDNA cloned full-length versions of the gene both in the rat and the mouse (Fig. 1), but were unable to identify it in the human. As the term Clec15a is an established synonym for Klrg1 (Mafa), we propose the name Clec2m, denoting close kinship to the Clr (Clec2d) genes and high transcription in macrophages (see below). The gene shows highest transcription levels in sDC and pMΦ, and among tissues bone marrow and testis (Fig. 3).

Clec12a (Micl) and Clec12b

Clec12a possesses a single immunoreceptor tyrosine-based inhibition motif (ITIM) sequence (IxYxxL) in the cytoplasmic tail. Rat Clec12b contains the ITIM-like sequence AxYxxL, whereas mouse and human Clec12b both have the consensus ITIM VxYxxL (Fig. 1). For Clec12a, the inhibitory function suggested by the ITIM has been demonstrated both in the human, where it was first identified as myeloid inhibitory C-type lectin-like receptor (Marshall et al. 2004) and later in the mouse (reviewed in Huysamen and Brown 2009). In the human and the mouse (Pyz et al. 2008), Clec12a is expressed predominantly by DC, macrophages, and granulocytes and down-regulated following stimulation with select toll-like receptor ligands. In the rat, the highest transcription levels of Clec12a were found in sDC and pMΦ, and among tissues in bone marrow and brain. Less is known about the closely related Clec12b (reviewed in Huysamen and Brown 2009). Its transcription profile in the rat is distinct from that of Clec12a, with moderate transcription levels in sDC and neutrophils, no detectable transcription in pMΦ, and with strong transcription in the testis (Fig. 3).

Clec1b, Clec9a, and Dectin1

These receptors share an atypical immunoreceptor tyrosine-based activation motif (ITAM) D/ExYxxL in their cytoplasmic tail (Fig. 1). In the human and the mouse, all three have been shown to recruit and mediate activation signals via spleen tyrosine kinase (Syk) (Rogers et al. 2005; Fuller et al. 2007; Huysamen et al. 2008). Clec1b, first identified in the human and named CLEC-2 by Colonna et al. (2000), is expressed on various cell types including megakaryocytes and platelets, where it triggers platelet activation and aggregation (Suzuki-Inoue et al. 2006). Various exogenous and one endogenous ligand, the sialoglycoprotein podoplanin, have been identified (Suzuki-Inoue et al. 2007). In the rat, Clec1b transcription levels were moderate in neutrophils, low in lymphocytes, undetectable in sDC and pMΦ and high in spleen and liver (Fig. 3). Clec9a has recently been identified both in the human and the mouse as an activating receptor (Huysamen et al. 2008; Caminschi et al. 2008), expressed primarily by a subset of monocytes and the rare CD141+ subset of DC (Huysamen et al. 2008). In the rat, it shows high transcription in the spleen, moderate to low transcription levels in several other tissues and cell types, but no detectable transcription in sDC or CD4+ T cells (Fig. 3). Whereas the physiological roles of Clec1b and Clec9a remain unknown, much insight has been gained about roles of dendritic cell-associated C-type lectin 1 (Dectin1). It was first identified in the mouse (Ariizumi et al. 2000) and has been shown both in the mouse and in the human to function as the main leukocyte β-glucan binding receptor, with a major role in antifungal immunity (reviewed in Sun and Zhao 2007; Brown 2006). In the mouse, Dectin1 is most strongly expressed on monocytes, macrophages, neutrophils, and microglia, weakly on a T cell subset, and in the human also on B cells, mast cells, and eosinophils (Brown 2006). In the rat, Dectin1 was transcribed in all tissues and cell types tested (except the NK cell line RNK16), with particularly strong transcription in sDC, pMΦ, bone marrow, and the thymus (Fig. 3).

Clec1a

Finally, although the human Clec1a receptor was first published 10 years ago, under the name CLEC-1 (Colonna et al. 2000), little is known about its functional properties and roles. In the human, the cytoplasmic tail contains one tyrosine residue close to the N-terminus. This is also present in the mouse and the rat sequences, which have an additional tyrosine residue forming the pattern YxxTx13YxxT (Fig. 1). Whether the tyrosines are subject to phosphorylation is not known. A notable sequence feature of Clec1 is the two cysteines immediately preceding the cysteines predicted to form the disulphide bond labeled 2 in Fig. 1. This is a characteristic shared with the Ly49 receptors, where the two extra cysteines form an additional SS-bond between the α1 chain and the last β chain of the lectin domain. In the rat, Clec1a is transcribed in all cell types and tissues tested, with the strongest transcription in ConA-stimulated blasts, spleen and kidney.

Comments on transcription patterns

The presence of ITIM motifs in Clec12a (and possibly Clec12b) and (atypical) ITAM motifs in Clec1b and Clec9a suggests inhibitory and activating functions, as shown for the human and mouse orthologs. As the occurrence of closely related receptors with opposite signaling properties is suggestive of “paired opposing receptors,” their widely different transcription profiles are noteworthy. In Fig. 3, we have also included transcription analyses of the Clr and the Nkrp1 genes, as this information has not previously been published in the rat and is therefore needed for complete comparison of the expression programs of the rat NKC genes. The pattern of strong NK cell transcription exhibited by NKR-P1A, -B, and -F receptors is the rule for the majority of the rat NKC receptors previously reported. In contrast, most of the genes reported here, as well as most of the Clr genes, show modest to low or no transcription in NK cells, demonstrating a genetic content beyond the original definition of this large congregation of genes as an NK cell gene complex.

Phylogenetic analyses

A striking feature of the eight receptors reported is the wide range of sequence divergences between the rat, mouse, and human orthologs (Fig. 2). The most extreme examples are Clec1a, which is highly conserved, and Clec12a, which is extensively changed between the three species (Figs. 1 and 2). For Clec1a, the rat and mouse protein percent identity is 95.5 and the rodent versus human identities 71.3/69.4 (human-rat/human-mouse). For Clec12a, the corresponding figures are 73.3 and 47.7/50.6. The dissimilarities between the three Clec12a sequences suggest positive selection, a notion further strengthened by Ka/Ks analyses of the cDNA sequences. Ka/Ks analysis compares the rate of non-synonymous to synonymous substitutions, with a higher value indicating possible positive selection. For Clec1a, the mouse versus rat Ka/Ks ratio is 0.10 and for Clec12a, 1.00. Splitting the Clec12a sequences into exons 1–3, encoding the cytoplasmic tail/transmembrane/neck domains, and exons 3–6, encoding the lectin domain, i.e., the presumed ligand binding part of the receptor, gives Ka/Ks values of 0.57 and 1.42, respectively. The ligands of these receptors are unknown, but the analysis suggests that the Clec1a ligand is phylogenetically conserved, whereas the Clec12a receptors may be chasing a more rapidly evolving ligand.

Phylogenetic analyses indicate that CD69 and Clec2m belong to a different subfamily than the other six receptors (Fig. 2). On inclusion of the other CLSF receptors encoded by the rat NKC, they exhibit the closest sequence similarity with the Clr subfamily (Fig. 4). Although sensitive to gap parameters and not significant according to bootstrap analyses, the association seems reasonable considering the physical localization of Cd69 and Clec2m next to the Clr genes (Fig. 5) combined with the fact that they consist of only five coding exons, lacking a separate exon encoding the external membrane-proximal stalk. This is a property they share with the Clr genes, and contrasts with all the other NKC CLSF genes, which generally consist of six coding exons (apart from Lox1, which has eight coding exons).

Fig. 4
figure 4

Dendrogram based on deduced protein sequences of all the CLSF genes encoded by the rat NKC (only eight of the predicted 34 Ly49 receptors shown). Also included are Mafa1/Klrg1 and Mdl1/Clec5a, plus the seven predicted functional APLEC genes. Vertical black lines indicate major branches of the NKC receptors. The genes containing five coding exons are indicated with 5 ex. Only bootstrap values >900 from 1,000 reiterations are shown

Fig. 5
figure 5

Map of the rat NKC. Arrows indicate gene orientation. Numbers below the line indicate distance from the starting point (the Nkrp1a gene at 165.637 Mb according to the current release of the rat genome). The four major clusters are color-coded: green Nkrp1/Clr gene cluster (including Cd69 and Clec2m), magenta Clec12aLox1 cluster, red Klre1–Klri2 cluster, blue Ly49 cluster. The topography of the chromosomal map with respect to major clusters is seen to be congruent with the major branches of the phylogenetic tree in Fig. 4. It should be emphasized that most of the genes shown here have been cDNA cloned primarily in the DA or PVG strain, so that the map shown here represents the positions of the predicted alleles from the BN strain, from which the rat genome sequence is derived. The extent of intestrain variation, including copy number variation, is so far largely uncharted

As for the other six genes, the Clec12 and Clec1 genes tended to cluster together (Fig. 2). The shape of the tree is, however, highly sensitive to gap parameters and to inclusion of other NKC genes (Fig. 4). The tree shape in Fig. 2 is similar to that reported by Hao et al. except for Clec9a, which was assigned to an entirely different clade of CLSF genes (Hao et al. 2006). Arguments in favor of including Clec9a in the Clec12 – Lox1 clade are the sharing of chromosomal localization and sequence features such as the atypical ITAM motif. Furthermore, a conserved feature of the NKC-encoded CLSF receptors is the Cx10C loop of amino acids clamped by an SS-bridge between the flanking cysteines labeled 1 in Fig. 1. In front of these, Clec1a and –b, Clec12a and –b, and Clec9a have two additional cysteines, the first encoded by a codon near the end of exon 3 (Fig. 1) and restricted to these receptors, whereas the second, encoded by the third codon in exon 4 (Fig. 1) is also present in CD94, Klrk1, the Nkg2 and the Klri receptors. The atypical ITAM motif and the two extra cysteines are likely to represent derived (apomorphic) rather than ancestral (plesiomorphic) states, so that the assigning of Clec9a to a different clade would implicate convergent evolution or sequence transfer through gene conversion. The simpler explanation is that their shared presence stems from phylogenetic kinship. Whether the extra cysteines are involved in inter- or intrachain disulphide bonding is not known.

The organization of genes in the rat NKC

The phylogenetic tree shown in Fig. 4 includes group II and group V CLSF members localized on rat chromosome 4 (for simplicity, only eight Ly49 genes are shown, as all 34 cluster together in a single clade). In addition to the genes depicted in Fig. 5, the tree includes Mafa1 (Klrg1), Mdl1 (Clec5a), and the seven genes grouped together in the APLEC cluster. The tree is seen to consist of six major branches. The exact branching is highly sensitive to gap parameters, but apart from the placement of Clec9a as discussed above, it is identical to the tree presented by Hao et al. (2006) with respect to major branches. The interesting feature of the tree is that the gene content of the major branches correlates exactly with the chromosomal clustering of the genes (apart from Mafa1 and Mdl1). In addition to supporting the soundness of the tree, the observation indicates that this large congregation of paralogous genes has evolved by local gene duplications without major reshufflings. At the local level chromosomal inversion events have occurred, as evidenced by the changes in gene orientations, as well as segmental duplications involving more than one gene in the case of the Ly49 gene cluster, as previously discussed (Nylenna et al. 2005b). A notable feature is the conservation of orientation of the 41 most telomeric genes of the complex, from Klrk1 to Ly49i8. The evolutionary events leading to the gene arrangement of the Nkrp1/Clr cluster seem particularly complex. In addition to shifting gene orientations, the cluster consist of two separate blocks containing Nkrp1 genes with opposite signaling function next to series of Clr genes with incongruent patterns of topographical versus sequence neighborness.

The demarcation of the NKC

The chromosomal region starting with Nkrp1a and ending with Ly49i8 contains almost all known group V CLSF genes in the rat. Exceptions are Cd72, localized on chromosome 5, and Klrg1 (Mafa1), Mdl1 (Clec5a), and Clec2l, all three on rat chromosome 4, at distances 7, 97, and 99 Mb centromeric to Nkrp1a, respectively. Furthermore, the region contains nothing but group V CLSF genes, with Gabarapl1 (Gamma-aminobutyric acid receptor-associated protein-like 1) as the single known exception. Although heterogeneous in gene content, it displays conserved gene organization and persistence of orthologous gene lineages across four major mammalian orders, as described by Hao et al. (2006). These authors also included in the NKC the Mincle–Dcir gene cluster (Fig. 5), which in all four orders occupy a chromosomal region close to, but clearly separate from the Nkrp1aLy49i8 region. When we first described the genetic region containing the Mincle–Dcir gene cluster, which we named the antigen-presenting cell lectin-like receptor gene complex (APLEC), we presented arguments for and against including these genes in the NKC. Counterarguments were (1) the evolutionary distance between the two groups of genes. Whereas the APLEC genes are classified as group II CLSF (http://www.imperial.ac.uk/research/animallectins/), with conserved amino acid residues involved in calcium-dependent saccharide binding (Weis et al. 1991), the group V CLSF have lost these amino acid motifs (Weis et al. 1998). (2) The distance separating the APLEC genes from the NKC. In the mouse and the rat, it is 5–6 Mb; in the human, cow, horse, and dog, only ∼1.5 Mb. In all these species, the intervening region is packed with non-CLSF genes, which would be included in the NKC if defined as encompassing the APLEC, with loss of communicative precision and a potential source of confusion in the future mapping of functional traits to this chromosomal region. Even when narrowly defined, the NKC represent one of the largest congregation of paralogous genes known in vertebrates.