Background

In recent years, a family of molecules with roles in apoptosis and immune regulation has been discovered in mammalian genomes. This gene family, is known under several pseudonyms, including the CATERPILLER (CLR), NACHT, NOD-LRR or NOD-like receptor (NLR) family and is comprised of two major subfamilies of NOD and NALP molecules, along with 3 divergent members; IPAF, the MHC class II transactivator (CIITA) and neuronal apoptosis inhibitory protein (NAIP) [1, 2]. Official names have been recently assigned to many members of this family by the HUGO Gene Nomenclature Committee (HGNC) [3] using the NLR prefix (Table 1). NLRs are recognized by the presence of three specific domains; an effector domain at the N-terminus that is involved in protein:protein interactions, a central NACHT (or nucleotide binding oligomerization/NOD) domain and a C-terminal leucine-rich repeat (LRR) domain. They are, therefore, structurally similar to disease resistance (R) proteins found in plants that are well known for their anti-microbial activities [4]. In humans, 22 NLRs have been described including 14 NALPs, with a PYRIN effector domain, and 5 NODs whose effector domain is typically a caspase recruitment domain (CARD).

Table 1 Human NLR sequences used for analyses

The functions of the NLRs are presently not well defined. However, based on their structural characteristics, these molecules are thought to be expressed in the cytosol of immune-related cells, and have been implicated in autoimmune diseases and responses to bacterial [5] or viral molecules [6] supporting their importance in host immunity. Some of these molecules activate caspase-1 [7], while others initiate [8, 9] or inhibit NF-κB signaling [10]. These two molecular pathways are fundamental to a molecular platform known as the inflammasome [7], which coordinates the production and processing of important inflammatory cytokines such as interleukin (IL)-1, IL-18 and IL-33 in mammals. Proteins that assemble in this caspase-1 inflammasome vary according to the cell type and stimulus [11]. Other molecules (e.g. caspase 11) that are necessary for inflammasome function are thought to be generated or recruited as a result of cross-talk between NLR and toll-like receptor (TLR) signaling [12]. According to current hypotheses, the activation of NLRs occurs following recognition of specific ligands by their LRR domains similar to the way that TLRs recognize molecules from extracellular pathogens. NLR proteins are, therefore, believed to represent cytosolic pattern recognition receptors (PRRs) that use LRR regions to detect intracellular pathogens. Those NLRs that are better defined functionally include NOD1, NOD2, and NALP3. NOD1 and NOD2, have both been shown to play a role in immunity of the mammalian gut and are highly expressed in epithelial cells or macrophages associated with the intestine. NOD1 recognizes a molecule known as meso-DAP (γ-D-glutamyl-meso-diaminopimelic acid), which is a peptidoglycan (PGN) component found in Gram-negative and Gram-positive bacteria [13], while NOD2 recognizes muramyl dipeptide, a peptidoglycan component found only in Gram-positive bacteria [14]. NALP3 (alias cryopyrin) has been shown to recognize a wide range of molecules, including bacterial RNA and synthetic viral RNA/DNA mimics (R837 and R848) [6]. NALP3 becomes activated in TLR-primed macrophages in response to ATP (adenosine triphosphate) and bacterial toxins that lower cytoplasmic K+ [12], which is thought to be the major mechanism in the NALP3 response to certain Gram-positive bacteria. In a distinct pathway, monosodium urate and calcium pyrophosphate dihydrate crystals have been shown to increase caspase-1 activity in a NALP3-dependent (TLR-independent) manner [1], representing potential 'danger signal' ligands for NALP3 [5] and defining a further role for NLRs in recognizing cellular stress.

Members of the NLR family have not been extensively studied in taxa other than mammals, although recent reports indicate some members of this family exist in lower vertebrates [15] and in invertebrates [16]. Extending the knowledge of NLRs in ectotherms, this study reports an extensive overview of the NLR family in teleost fish, represented using information derived from the zebrafish Danio rerio. Here, we describe the gene phylogeny and expression of three major subfamilies of NLRs in teleostei, which we designate NLR-A and NLR-B (resembling mammalian NOD and NALP subfamilies respectively) and NLR-C, a large subfamily (characterized with a NOD-3-like NACHT domain and an unusual C-terminal domain) that appears in all teleostei genomes, and is unique to bony fish. The implications of all three subfamilies in immune regulation of fish are discussed.

Results

Many NLR-like sequences were identified in the genome and EST databases of non-mammalian vertebrates. These genes were compared by phylogenetic analysis of their deduced NACHT domains (Fig 1). Mammalian NLRs are categorized into NOD and NALP families according to previous publications [1] and as depicted in Table 1. In the zebrafish genome, three distinct subfamilies were identified and highly supported by bootstrap analysis; some resembled mammalian NODs (designated subfamily A; Table 2; Fig 1C), some resembled mammalian NALPs (designated subfamily B; Table 3; Fig 1B) and some formed a unique clade, closely related to NOD3, which was restricted to teleostei (designated subfamily C; Table 4; Fig 1A).

Figure 1
figure 1

Phylogenetic comparison of vertebrate NLR molecules. Amino acid sequences of the NACHT domains (between the GxxGxGKS/T motif and the 'FAAFY' signature of human NOD2 or equivalent region in other NLRs) of vertebrate NLRs were aligned using CLUSTALW. Trees were constructed from these multiple alignments using the Maximum evolution and Neighbor-joining methods within the MEGA 3.1 program, using Poisson correction and complete deletion of gaps. Maximum evolution trees are shown. The resulting trees were bootstrapped 1000 times (shown as percentages). [A] Zebrafish NLRs were compared to human NLRs to estimate orthology. [B] The NALP subfamily was analyzed in more detail by comparing all zebrafish and Xenopus tropicalis predicted NALP-like molecules to human NALPs. [C] The NOD/NLR-A subfamilies of zebrafish, frog, chicken and humans were compared. DR = Danio rerio; XT = Xenopus tropicalis; GG = Gallus gallus; HS = Homo sapiens.

Table 2 NLR-SUBFAMILY A in zebrafish
Table 3 NLR-SUBFAMILY B in zebrafish
Table 4 NLR-SUBFAMILY C in zebrafish (selected examples)

Subfamily A

Teleost fish possess gene orthologs for all five members of the mammalian NOD subfamily (Table 2; Table 5). While chicken and Xenopus genomes apparently lack the NOD2 gene (Table 5; Fig 1C), both are in possession of the remaining four NODs. The gene predictions for zebrafish NOD sequences were corrected using corresponding ESTs identified in the TIGR database, and missing sequence found with assistance from other fish NOD-like sequences using the BLAT (BLAST-like alignment tool) program. Following assembly, zebrafish NODs were highly structurally conserved relative to human NODs. Zebrafish NOD1 (NLR-A1) has an N-terminal CARD domain, and nine highly conserved leucine-rich repeats (LRRs) (Fig 2) although the 5' and 3' exons were not identified. Two CARD domains were identified at the N-terminal end of zebrafish NOD2 (NLR-A2), and eight LRR domains were recognizable by their LRR-like motifs (e.g. the amino acid signature LxxLxLxxCxL, where L = L, I, V or F and C = C or N) that align exactly with the LRRs of human NOD2 [see Additional file 1]. Although the N-terminal end of NOD3 (NLR-A3) was not recognized by the CDD, it shares some similarity with the human NOD3 effector domain with two conserved sequence signatures, MRK and EAG (amino acids 59–61 and 69–71 of DR-NLR-A3 respectively). The C-terminal end of NLR-A3 possessed 14 LRR motifs, which aligned exactly with LRR domains of human NOD3 [see Additional file 1], with a similar motif (CxxLxMxxNxF) between the NACHT domain and the first true LRR motif. Both NOD4 (NLR-A4) and NOD5 (NLR-A5) orthologs in zebrafish have conserved sequences within their LRR domains relative to their human equivalents, although the predicted LRR for zebrafish NLR-A4 is shorter than that of human NOD4, and less conserved relative other human and zebrafish NLR-A orthologs. The N-terminal domains for NLR-A4 and NLR-A5, as with NLR-A3, were not identified within the CDD database set, but share some conserved features with the corresponding regions of mammalian NOD4 and NOD5. NLR-A4/NOD4 appears to represent the most divergent gene within the NLR-A subfamily, yet zebrafish NLR-A4 groups with high bootstrap support with human NOD4 and its orthologs from chicken and Xenopus during phylogenetic comparisons (Fig 1C). All five NLR-As are located on distinct chromosomes in zebrafish. The NLR-A1 gene is located on chromosome 16 in version 6 of the zebrafish genome (Zv6) (Table 2), but is not mapped to a chromosome in version 7 (Zv7) [see Additional file 2], NLR-A2 resides at chromosome 7 and NLR-A3, NLR-A4 and NLR-A5 can be found on chromosomes 24, 18 and 15 respectively.

Figure 2
figure 2

Schematic diagram depicting the deduced protein structures of zebrafish NLRs. Zebrafish NLR subfamily-A have structures similar to the NOD subfamily in mammals, with NOD1/NLR-A1 possessing one CARD motif while NOD2/NLR-A2 possesses two. While only the NACHT domain was identified for most members of the NLR-B subfamily, one member of this group (NLR-B2) was predicted with a putative N-terminal CARD effector domain. All NLR-C subfamily members were predicted to have an N-terminal effector domain, a central NACHT domain and a LRR domain. In addition, some of the NLR-C molecules were identified with a C-terminal B30.2 domain. The predicted effector domains of molecules within the NLR-C subfamily varied; some had a pyrin (P) effector domain, while others had a distinct uncharacterized effector domain (X). B30.2 domains are also found in other important immune related molecules such as certain TRIMs and the Pyrin molecule, whose structures are also shown. C = card domain, P = pyrin domain, X = other domain, N = NACHT domain, L = LRR region, B = B30.2/PRY-SPRY domain, R = ring finger domain, BB = B-box, CC = coiled coil.

Table 5 NLR subfamily genes identified in other non-mammalian genomes

Subfamily B

Six distinct genes encoding NACHT domains were identified in Zv6 that belong to subfamily B and form a separate cluster within the clade of mammalian NALPs. Although zebrafish NLR-B2 and B3 were identified in distinct regions of the zebrafish genome (Table 3) these genes are identical in the region of the NACHT domain used for phylogenetic analysis. Several NALP-like sequences were also identified for Xenopus tropicalis (Table 5) that similarly formed their own cluster distinct from the human and zebrafish NALPs (Fig 1B). Gene predictions encoding putative NALPs in zebrafish are short, with most lacking a recognizable effector domain and C-terminal LRR domain. One exception is NLR-B2, which appears to have an N-terminal region with low similarity to a CARD motif as identified by searching the CDD. Only one cDNA sequence resembling this subfamily could be identified in the zebrafish EST database [GenBank:AI883819] that, although highly similar in sequence, was not an exact match to any of the predicted NLR-B genes and appeared to encode only a portion of the NACHT domain. NLR subfamily B genes appear to be restricted to small clusters on chromosomes 2 and 15 in zebrafish. Furthermore, NLR-B5 and NLR-B6 reside close (28.3–28.5 m) to NLR-A5 (32 m) on chromosome 15. Later analysis of Zv7 revealed removal of the NLR-B5 gene prediction, and its merger with the prediction for NLR-B6 [see Additional file 2].

Subfamily C

Database searches revealed multiple genes that possessed NACHT domains and shared significant homology to human NOD3 yet were distinct from the zebrafish NOD3 molecule (NLR-A3) described above. This large number of highly similar genes clearly arose from several gene(ome) duplication events. Several hundred predicted genes/proteins were observed for this group in the databases for all teleost fish (data not shown). In zebrafish, these genes were found at numerous chromosomal loci, with large clusters evident on (at least) chromosomes 1, 4, 14 and 17. A small selection of these genes was subjected to further analysis (Table 4 and Fig 1A). These molecules divided into three clusters during phylogenetic analysis, which also corresponded to sequence differences identified in the N-terminal region. Representatives from chromosome 14 were identified in all three clusters, while NLRs from some other chromosomes (e.g. chromosomes 12 and 17) were restricted to one cluster, although not all genes were included in the analysis.

Although the NACHT domains of the C-group NLRs are clearly homologous to NOD3, many of these genes were found to encode a conserved PYRIN domain at the N-terminus (Figs 2, 3) showing some analogy to mammalian NALP genes. The presence of this domain was confirmed by identifying an EST sequence containing the PYRIN domain and a partial NACHT domain that resembled C-group NLRs. Other C-group NLRs had N-terminal sequences with no obvious gene ortholog. While some are likely incorrectly predicted domains, EST sequences confirm at least two of these predicted N-termini are transcribed in association with the NLR C-group NACHT domain (Fig 4). The NLR C-group molecules also possess an LRR region, as with other NLRs. Unexpectedly, a B30.2 (PRY-SPRY) domain was identified in several of the predicted genes for NLRs of the C subfamily. Owing to nature of this large multigene family, a single representative EST was sequenced to confirm domain structure, including the verification of the B30.2 domain. The B30.2. This domain was found at the C-terminus, following the LRR domain and was confirmed by completely sequencing EST [Genbank:CK126487]; the 4,524 bp sequence was submitted to GenBank [GenBank:EF613347] and contained sequence from the NACHT domain to the poly-A tail. Further overlapping ESTs/TCs identified in the TIGR database provided additional confirmation for the NLR C-group, with an effector (PYRIN or other) domain, NACHT domain, LRR domain and a C-terminal B30.2 domain (Fig 5) [see Additional file 3] although, due to the high number of closely related sequences for this subfamily and the current stage of the zebrafish genome sequence, it was not possible to ascertain whether the overlapping ESTs were generated from the same gene or distinct genes within the NLR-C family.

Figure 3
figure 3

Many N-terminal effector domains of the predicted zebrafish NLR-C molecules are recognized as pyrin/PAAD-DAPIN domains based on the HMM logo. Some examples are shown, with conserved amino acids [A]. Other N-terminal sequences were observed for NLR-C molecules, which were confirmed in the EST databases at TIGR [B]. TC326097 encodes a pyrin domain, whereas TC353741 and TC343155 represent undefined N-terminal domains such as those denoted by 'X' in Figure 2.

Figure 4
figure 4

A. Schematic representation of the approximate positions of NLR encoding EST sequences relative to predicted NLR-C proteins. TIGR database accession numbers for the ESTs are given. N1 = X effector domain beginning with sequence MAEERV, N2 = recognized Pyrin effector domain, N3 = X effector domain beginning with sequence MEDTHS. B. Full cDNA sequence for EST CK126487 that spans a region from the NACHT domain to the 3'UTR of an NLR-C gene. LRR signatures are indicated with a wavy underline, and the signature for the B30.2 domain is boxed. Features for polyadenlyation (AATAAA) and mRNA instability (ATTTA) in the 3'UTR are double or single underlined respectively, and the stop codon (tga) is shown in bold italics.

Figure 5
figure 5

RT-PCR analysis of the NLR gene family in intestine (1 and 2), liver (3 and 4) and spleen (5 and 6) of two individual naïve zebrafish. ARP expression was amplified to verify cDNA synthesis. Negative controls were performed using templates from cDNA synthesis reactions without reverse transcriptase (7). Genomic DNA (8) was amplified to verify primer efficiency and to show the difference in size of genomic amplicons compared to cDNA amplicons.

Other NLRs

Although the CIITA was evident in the genomes of the pufferfishes, frog and chicken, this molecule was not readily identifiable in zebrafish Zv6. However, later analyses identified a CIITA-like gene, in Zv7 of the zebrafish genome, which resides on Chromosome 3 at approximate position 24.3 m (Zv7_scaffold 244). Sequences for NAIP were not identified in lower vertebrates during this study. IPAF was identified in the frog genome (Table 5), but not in the other non-mammalian genomes. A recently described family of NLR-like genes from the sea urchin was found to cluster with mammalian IPAF and NAIP molecules during phylogenetic analyses (data not shown).

Expression of zebrafish NLRs

The spatial expression of NLR genes was evaluated in selected tissues from naïve zebrafish. NLR-A1, -A2, -A3 -A4 and -A5 were all identified in zebrafish intestine using RT-PCR. All five genes were also expressed in liver although expression of NLR-A2 was extremely weak. NLR-A3 expression was not detected in the spleen following 35 PCR cycles, but the four other NLR-A genes were expressed in this tissue (with low expression of splenic NLR-A5 in one individual). As a representative of the NLR-B subfamily, NLR-B2 expression was investigated and detected in all three tissues. Similarly, mRNA was detected for an NLR-C gene(s) in all three tissues using primers based on the completely sequenced EST clone. The primers used to detect these genes amplified no products in control reactions whose templates were sterile water (not shown) or from cDNA syntheses performed in the absence of reverse transcriptase (RT-). ARP was amplified from all tissues confirming the integrity of the cDNAs and the success of RT-PCR. Genomic products were amplified for all NLR-A genes that were larger than the cDNA amplicons and supported the presence of intron(s) between the primer regions.

Discussion

New insight into the regulation of essential developmental, inflammatory and apoptotic pathways was achieved with the discovery and characterization of the NLR gene family of putative cytoplasmic pattern recognition molecules. While an increasing amount of information exists for these molecules in mammals, this gene family is poorly studied in other vertebrates with little to no information available even at the gene level for birds, amphibians and fish. This study resolves this issue by identifying and characterizing many NLR-like genes from these three classes of animals and uncovering a unique subfamily of NLRs in teleost fish.

Our evidence shows early evolution and high conservation of the NOD (NLR-A) subfamily of NLRs. All species of teleost fish that were analyzed had five distinct members of this subfamily designated NLR-A1, NLR-A2, NLR-A3, NLR-A4 and NLR-A5 that were clear gene orthologs of human NOD1 to NOD5 [1]. A NOD1 ortholog was also described during an earlier screen of zebrafish ESTs for molecules similar to apoptosis regulators [17]. In addition to encoded NACHT domains, the effector domains and LRR regions were highly conserved in the fish NLR-A genes relative to their human equivalents, suggesting retained function. NLR-A1 and NLR-A2, the fish orthologs of human NOD1 and NOD2 respectively, both possessed clear CARD domains (one in NLR-A1 and two in NLR-A2) with high amino acid identity to the equivalent regions of human molecules. In mammalian NOD1 (and presumably NOD2), the CARD domains are necessary for the interaction with RICK kinase, an enzyme that participates in NFκB activation and, ultimately, the generation of pro-inflammatory molecules [18]. Since RICK is also present in fish genomes (see zebrafish RIPK2, Q4V958), it would appear that this inflammatory cascade was established prior to the divergence of teleost fish from the tetrapod lineage, assuming that the same interaction occurs between these molecules in fish. The highly conserved sequences in the LRR domains implies these zebrafish NLR-A1 and NLR-A2 may also be able to recognize meso-DAP and muramyl dipeptide as mammalian NOD1 and NOD2 respectively [13, 14] although this requires formal confirmation. NLR-A1 transcript was detected equally in intestine, liver and spleen, reflecting the wide-spread distribution observed for murine NOD1 [18], while NLR-A2 was strongly expressed in intestine, with some expression in spleen and barely detectable levels in liver. Similar to the highest expression of NLR-A2 in zebrafish intestine, human NOD2 has a more restricted expression pattern, with predominant expression in cells of myeloid origin including monocytes [9] and Paneth cells [19] that are associated with the gut, although expression of NOD2 can also be induced in epithelial cells [20]. Zebrafish NLR-A3 is clearly an ortholog of mammalian NOD3, with similarity in the effector and NACHT domains and an equal number of LRR domains. At the genomic level, NOD3 is flanked by RHOT2, SBK1 and PDPK1, GNPTG respectively in zebrafish and fugu further supporting the orthologous relationship for NOD3 between fish species. Expression of NLR-A3 was strong in zebrafish intestine, with some expression also in liver and little to no expression observed in the spleen. Interestingly, the kidney (bone marrow equivalent) did not express NLR-A3 as well suggesting that it is not expressed by lymphocytes (data not shown). In mammals, NOD3 expression occurs primarily in lymphocytes and is attributed to inhibition of T-cell activity [21]. Two other NLR-A subfamily members were also identified in zebrafish that were designated NLR-A4 and NLR-A5 with NLR-A4 resembling human NOD4 and NLR-A5 being highly conserved to human NOD5. NLR-A4/NOD4 genes represent the most divergent members of this subfamily based on amino acid conservation within the N-terminal and LRR regions between different vertebrate orthologs. Both NLR-A4 and NLR-A5 genes were constitutively expressed in intestine, spleen and liver of naïve zebrafish, although there is clearly some fish to fish variation preventing their detection in some individuals under the conditions used for RT-PCR. Currently, there is no information concerning the expression patterns or functions of these latter two NLRs in mammals.

Whereas NOD1, NOD3, NOD4 and NOD5 appear to be conserved in bird and amphibian genomes, the gene for NOD2 was identified in neither the chicken nor the frog genomes. This would suggest that NOD2 has been deleted from the genomes in these species, although the genome of Xenopus tropicalis is, at present, incomplete. This is surprising since NOD2, in mammals, appears to be a highly important sensor for intracellular microbial molecules. However, chickens do possess a NALP3 ortholog (see below) representing another potential PRR for muramyl dipeptide [22] and may functionally replace NOD2 in this species.

Members of the NALP subfamily are also evident in lower vertebrates. Six genes were identified in zebrafish (Zv6) for NALP-like molecules (NLR-B1 to -B6), and ten predicted NALP-like genes (nicknamed NALPa to NALPj) were found for Xenopus. These genes clustered separately for each species, suggesting recent duplication events formed the NALP subfamilies independently in fish, amphibians and mammals. The closest human ortholog of the amphibian and fish NALPs appears to be NALP6. A single NALP-like sequence predicted in the chicken genome (ENSGALG00000005155) and in the Uniprot database (Q5F3J4) clusters closest to the group of human NALPs 1, 3, 10, and 12 when analyzed phylogenetically (although not with strong bootstrap support) and has recently been given the name NLRP3 (previously designated CIAS1/NALP3). Chicken NALP was identified on chromosome 5, separate to the chicken NOD5 gene (chromosome 24). Although sequence variation makes accurate comparisons difficult, it is likely that this chicken gene arose from a distinct NALP than the fish and amphibian NALPs, with the ancestral NALP(s) possibly lost from the genome. The discovery of multiple NALP-like proteins in lower vertebrates contradicts a recent hypothesis by Hughes suggesting that the NALP subfamily evolved only in mammals [15], with clear evidence that a gene encoding the NACHT domain of at least one NALP (possibly a NALP6-like gene) was present prior to the fish-tetrapod split. Zebrafish NALPs are situated at two distinct chromosomal locations, four of these genes (NLR-B1 to -B4) are located on chromosome 2, and the other two (NLR-B5 and -B6) can be found near NLR-A5 on chromosome 15; the new assembly of the zebrafish genome (Zv7) suggests these two sequences may represent the same gene. It should be pointed out that although chicken NLRP3 has an N-terminal PYRIN domain, the N-terminal domains for the Xenopus and zebrafish NALPs were not identified. One exception was NLR-B2, which appears to have a domain that resembles a CARD and not a PYRIN domain as would be expected from its similarity to the mammalian NALPs. No PYRIN domains are observed for the Xenopus NALP-like sequences and, other than the PY-CARD protein (prediction ENSXETT00000004042), no PYRIN domains were predicted in the Xenopus genome. These observations may reflect that early ancestors of NALPs lacked these effector domains and later acquired the PYRIN domain (or CARD domain in the case of NLR-B1). Whether these NALP-like genes encode functional PRRs in poikilotherms remains uncertain, however, NLR-B2 transcript was detectable in zebrafish intestine, spleen and liver suggesting this may represent a functional gene.

In addition to the NOD- and NALP-like subfamilies, a unique subfamily of NLRs was identified in teleost fish, and designated NLR subfamily C (NLR-C). This subfamily is interesting for several reasons. Firstly, all teleostei genome (and EST) databases show numerous NLR-C genes, amounting to several hundred of these genes in a single species. Secondly, these genes all possess a central NACHT domain that is highly similar to the NACHT domain of NOD3 (NLR-A3) suggesting they evolved from a NLR-A3-like molecule, yet many of these genes possess a PYRIN domain at their N-terminus making them more structurally similar to mammalian NALP molecules. Finally, following the LRR domain many of these molecules (representatives found in all bony fish) possess a B30.2 (PRY-SPRY) domain, which may allow them to interact with distinct molecules to standard NLRs and thus perform some novel function. B30.2 domains are also found on some tripartite motif containing (TRIM) proteins [23] and on the PYRIN molecule [24] (Fig 2) and have several roles related to immunity. TRIM5a has been shown to inhibit retroviral activity by directly binding the capsid of the HIV retrovirus [25], and PYRIN has been shown to inhibit the activity of caspase-1 by directly binding to the active site of this enzyme [24], both using their B30.2 domains for these interactions. Each of these functions would fit with the role of NLRs as intracellular PRRs; the ability to bind viruses could be an extension of the pattern detection system attributed to the neighboring LRR domain, while the potential to inhibit caspase-1 activity may make NLR-C molecules important negative regulators of the inflammasome in teleost fish. The latter function would reflect gene families of cell surface receptors such as killer immunoglobulin-like receptors (KIRs) [26] or novel immune-type receptors (NITRs) [27] that possess many inhibitory receptors and a small number of stimulatory receptors for controlling cellular activation. It is also interesting that these molecules all contain a NACHT domain similar to NOD3, since mammalian NOD3 has an inhibitory role in T cells [21]. Importantly, since the predicted N- and C-termini of some NLR-Cs are structurally similar to the two domains of the PYRIN molecule, this would also fit with a potential function of mimicking PYRIN. However, additional studies are required to determine what, if any, role in the immune system NLR-C molecules may play.

The evolutionary processes generating the vast subfamily of NLR-C genes are not clear and appear very complex. The relationships are further confused by apparent errors in the assembly of the zebrafish genome (Zv6 versus Zv7), as evidenced by clear differences in the mapping of some NLR-C genes to their predicted chromosomes between assembly versions [see Additional file 2]. However, evidence suggesting tandem duplications of individual genes within a chromosomal locus is consistent between Zv6 and Zv7, which result in NLR-C genes adopting new exons encoding distinct N-terminal domains and/or C-terminal domains via exon-shuffling. The clusters of tandem NLR-C genes appear to have undergone en bloc duplication, to generate further clusters in the same locus (cis duplication), or within distinct loci or chromosomes (trans duplication) through translocation. Single genes may also have duplicated independently multiple times, within established loci and to create new loci, prior to and following formation of new gene structures. A large scale duplication of this gene family may be explained by the teleost-specific genome duplication event (3R) occurring early in the evolution of teleost fish, which followed two rounds of complete genome duplication (2R) observed early in the evolution of the vertebrate lineage [28]. Should this be the case, mutations and deletions of many of the duplicated genes would be expected, to remove redundancy from the genome [29]. Therefore, many NLR-C genes may be non-functional genes or pseudogenes, although a small number have likely established new functions. Clearly, it is too early to assign functionality to these genes, except to note that many are transcribed and are presumably translated into protein products. Transcripts for NLR-C were detected, in this study, in three distinct tissues in naïve zebrafish, and many more can be identified in the EST databases for this fish species.

A CIITA-like gene was identified in the zebrafish genome (Zv7) and is an important molecule, in mammals, for controlling the expression of both major histocompatibility complex class I and class II molecules and therefore is significant for antigen presentation to T lymphocytes. Defects in human CIITA gene expression have been linked to several immune disorders [5]. However, alternative molecules have been implicated in the induction of antigen presentation pathways [30], including other members of the NLR family, such as NALP12 [31]. NAIP/IPAF homologs have been identified in the sea urchin [16] implying that the ancestral NLR resembled one of these molecules. However, neither NAIP nor IPAF was identified in the fish genomes at this time, although IPAF was evident in the frog genome, suggesting that the genes for these molecules may have been lost from the fish genomes during the teleost-specific genome duplication event.

Conclusion

In summary, the NLR gene family contains several members in all vertebrates, and at least one prototypical gene must have existed prior to the evolution of vertebrates. Clearly, there are some losses and gains of NLR genes in the genomes of distinct species thus shaping unique repertoires of these molecules throughout vertebrates and invertebrates. Although there are still many members of the NLR family that require functional characterization, their implication as regulators of immunity is highly intriguing and warrants future investigation.

Methods

Identification of NLRs in non-mammalian vertebrates

The amino acid sequences for human NLRs were obtained from UNIPROT (Release 9.0) [32]. These are listed in Table 1, with recently defined nomenclature assigned at HGNC [3]. Predicted genes for non-mammalian NLRs were identified in the UNIPROT database and at ENSEMBL [33] for chicken Gallus gallus, pipid frog Xenopus tropicalis, Japanese pufferfish Fugu rubripes, green spotted pufferfish Tetraodon nigroviridis and zebrafish Danio rerio, by using the BLAST algorithm to search for sequences with similarity to human NLRs [34]. Gene predictions were also identified in ENSEMBL by keyword searches for NACHT, PYRIN or CARD domains. EST sequences for these species were identified by BLAST-based searching the TIGR gene indices [35] or the "other vertebrate EST" section of GenBank at NCBI, and used to confirm and correct the gene predictions. The unique NACHT-LRR-B30.2 arrangement for NLR-C was determined by completely sequencing zebrafish EST CK126487 [GenBank:EF613347]. The EST was obtained from the American Type Culture Collection (ATCC) (Image number 7049223) and sequencing was carried out in-house using an ABI 3030 automated sequencer, universal (SP6/T7) and gene specific primers (located in Table 6) and BigDye V3.1.

Table 6 Oligonucleotide primers used for RT-PCR analyses and sequencing

Chromosomal locations for the NLRs were deduced by matching the translated NLR sequences against the genomes using BLAT [36] at the UCSC Genome Browser database [37]. Specific domains within the zebrafish NLRs were confirmed by searching the Conserved domain database (CDD v 2.09) [38] at NCBI, by comparison to the PFAM hidden Markov Model (HMM) logos [39] and by direct comparison to putative mammalian orthologs. Genome versions used during these analyses are G. gallus assembly version 2.1 (May 2006), X. tropicalis assembly version 4.1 (August 2005), T. nigroviridis assembly version 7 (February 2004), T. rubripes assembly version 3 (August 2002) in BLAT and version 4 (December 2005) in ENSEMBL, and D. rerio assembly version 6 (March 2006). Following submission of this manuscript, assembly version 7 of the zebrafish genome became available, and all gene predictions were reanalyzed against this assembly. Data from Zv7 are available in supplementary tables [see Additional file 2].

Phylogeny of NLRs

The phylogenetic relationships between zebrafish NLRs and human NLRs were predicted using both the maximum evolution and neighbor-joining methods within the MEGA 3.1 program [40]. Partial amino acid sequences from the NACHT domain (from regions corresponding to the GxxGxGKS motif to the FAAFY sequence signature of human NOD2) were used in the analyses as this region was clearly identified in all NLRs. Further analysis of the NLR-A and NLR-B subfamilies including frog and chicken NLRs were performed using the same methods. All trees were constructed from CLUSTALW generated alignments [41], using Poisson correction, complete deletion of gaps, and bootstrapped 1000 times.

Expression of zebrafish NLRs

The specific expression patterns of the NLR gene family in zebrafish tissues were investigated. Zebrafish (Ekwill strain) were obtained from Ekwill Fish Farm, FL and reared in sand-filtered and UV-treated freshwater at a constant temperature of 24°C. Fish were fed a daily ration of adult zebrafish diet (Zeigler). Genomic DNA was extracted from fin tissue using the DNeasy extraction kit (Qiagen) following manufacturer's instructions. The spleen, liver and intestinal tissues were removed from two individuals and RNA was extracted using the RNeasy RNA extraction kit with in-column DNAse treatment (Qiagen) following manufacturer's instructions. Total RNA was purified and cDNA was synthesized as previously described [42]. A control, containing liver RNA but lacking reverse transcriptase, was also synthesized. The 20 μL cDNA synthesis reactions were diluted to a final volume 100 μL and stored at -20°C until use. PCR amplifications were performed in a 25 μl final reaction volume containing 2 μL of diluted cDNA, reagents from the Taq core PCR kit (Qiagen) and 12.5 pM of each primer. Primer pairs used to detect transcripts for each NLR gene are listed in Table 6 with their sequences. Cycling conditions for all amplifications consisted of 95°C for 3 min, 35 cycles of 94°C for 30 sec, 55°C for 30 sec and 72°C for 1 min, followed by final extension of 10 min at 72°C. Amplified products were subjected to electrophoresis on a 3% agarose gel and visualized by ethidium bromide staining.