Introduction

The Rel homology domain (RHD) is an evolutionarily conserved motif of approximately 300 amino acids that was first recognized in the transforming gene of the avian reticuloendotheliosis virus (Chen et al. 1981). It is present in the N-terminal region of proteins belonging to two transcription factor families: Rel/nuclear factor-kappa B (NF-κB) and nuclear factor of activated T-cells (NFAT). Members of the Rel/NF-κB family play major regulatory roles in immune response, inflammation, apoptosis, embryonic development, and differentiation (reviewed in Baldwin 1996). A wide array of extracellular stimuli result in NF-κB activation, including inflammatory cytokines, viral and bacterial infections, oxidative and DNA-damaging agents, UV light, and osmotic shock (Baldwin 1996).

The NFAT gene family is involved in T cell receptor (TCR) activation (NFAT1/c2, NFAT2/c1 and NFAT4/c3), inflammatory responses (NFAT3/c4), and osmotic balance regulation (NFAT5, also called tonicity enhancer binding protein, TonEBP) (reviewed in Macian 2005). There is evidence that the NFAT signaling pathway also participates in the regulation of cell growth, development, and survival in different tissues and cell types (review in Hogan et al. 2003; Benedito et al. 2005).

Both families of transcription factors need to form homo and/or heterodimers to translocate to the nucleus, bind DNA, and modulate gene transcription. The RHD is crucial for these events, as it possesses a DNA-binding domain; a dimerization domain; a nuclear localization signal (NLS); and, in the case of Rel/NF-κB proteins, a region to interact with inhibitory kappa-B proteins (IκB) (Ghosh et al. 1995; Müller et al. 1995). Rel transcription factors have an N-terminal RHD and a C-terminal transcriptional activation domain, and include Dorsal and Dif from Drosophila (Steward 1987; Ip et al. 1993) and Rel (also known as c-Rel), RelA (also known as p65), and RelB in vertebrates (Brownell et al. 1985; Ruben et al. 1991, 1992). NF-κB transcription factors are synthesized as large precursors comprized of an N-terminal RHD and a C-terminal consisting primarily of ankyrin (ANK) repeats, and include NF-κB 1 (p50/p105) and NF-κB 2 (p52/p100) in vertebrates (Sen and Baltimore 1986; Mercurio et al. 1992; Ghosh et al. 1995; Müller et al. 1995) and Relish in insects (Dushay et al. 1996). In NFAT proteins, the RHD is centrally located and is bounded by longer N- and C-terminal domains that vary in length depending on the spliced forms. A moderately conserved NFAT-homology region (NHR) is present at the N-terminal and contains the docking sites for calcineurin and NFAT kinases. There are five members of the NFAT family in vertebrates (reviewed in Macian 2005) and a single NFAT member in Drosophila, which is mostly similar in structure and function to vertebrate NFAT5 (Keyser et al. 2007).

All members of Rel/NF-κB and NFAT families need to be activated to translocate to the nucleus. NFAT activation occurs in response to calcineurin dephosphorylation of their serine residues (Loh et al. 1996). The Rel/NF-κB proteins are activated via a different mechanism. The IκB proteins retain Rel/NF-κB homodimers and heterodimers in the cytoplasm by binding to their RHD and interfering with their NLS function (Baeuerle and Henkel 1994; Siebenlist et al. 1994). The ANK repeats present in IκB proteins mediate binding to the RHD. These repeats are generally around 33 amino acids long and form a β-hairpin-helix–loop–helix \(\left( {{\text{ $ \beta $ }}_{{\text{2 $ \alpha $ }}} {\text{ $ \alpha $ }}_{\text{2}} } \right)\) structure. Each repeat contains a variant of the tetrapeptide sequence Thr–Pro–Leu–His (TPLH), which is involved in the α-helix formation (Sedgwick and Smerdon 1999). The IκB proteins are encoded either by separate genes (e.g., cactus, IκBα, IκBβ, IκBɛ, IκBγ) or at the 3′ end of NF-κB genes (Baeuerle and Henkel 1994; Siebenlist et al. 1994). In either case, activation of the NF-κB pathway leads to site-specific phosphorylation and ubiquitination of IκB. This results in IκB proteolysis (Chen et al. 1995; DiDonato et al. 1995; Traenckner et al. 1995), allowing the Rel/NF-κB dimer to translocate to the nucleus, where it can regulate relevant target genes.

Because ANK repeat domains are present in the genomes of disparate eukaryotes (Bork 1993), this domain is likely to have been present in the last common ancestor to living multicellular opisthokonts (i.e., metazoans, fungi, and protist allies); the evolutionary origin of the ANK repeats of the IκB subclass remains unclear. In contrast, the RHD has previously been documented only in bilaterians and cnidarians (Sullivan et al. 2007), and its genesis has not been resolved (Huguet et al. 1997). It has been proposed that an ancestral ANK repeat domain acquired the metazoan-specific RHD to give rise to the bilaterian NF-κB genes and that a subsequent scission event resulted in the origin of Rel factors and IκB inhibitors; loss of the ANK repeat also yielded the NFAT factors (Huguet et al. 1997).

The recent detection of two RHD-containing proteins in the cnidarian Nematostella vectensis – one truncated NF-κB, which lacks ANK repeats, and one NFAT gene – is the first evidence of the Rel/NF-κB and NFAT families predating the origin of the Bilateria (Sullivan et al. 2007). The presence of an NF-κB gene in N. vectensis that lacks a C-terminal ANK repeat domain is compatible with the proposal that the bilaterian NF-κB arose via the shuffling of initially independent RHD and ANK repeats. This event would have occurred in the bilaterian lineage after it diverged from the cnidarian lineage. However, equally parsimonious is the scenario that the last common ancestor to living cnidarians and bilaterians possessed a conventional NF-κB, and the ANK repeats were secondarily lost in N. vectensis.

By screening the sequenced genome of the demosponge Amphimedon queenslandica for RHD-containing genes, we test whether the origin of the conventional NF-κB is a bilaterian-specific innovation or has a more ancestral origin. Because sponges are considered to be an earlier branching lineage than cnidarians and quite possibly the earliest lineage of living animals (Borchiellini et al. 2001; Medina et al. 2001; Cavalier-Smith and Chao 2003), they are ideal to address questions regarding the origin of metazoan-specific gene families. Complete genome sequence allows for detailed comparisons with eumetazoan and other opisthokont genomes and for the identification of evolutionary events leading to the ancestors from which stemmed all modern metazoans (e.g., Adamska et al. 2007b; Larroux et al. 2007; Simionato et al. 2007). Here we show that A. queenslandica has a single RHD that is part of a fully formed NF-κB, as found in diverse bilaterians, supporting the proposition that this gene originated before metazoan cladogenesis and that the N. vectensis NF-κB secondarily lost its ANK repeats. The lack of RHDs in sequenced fungal and choanoflagellate genomes pinpoints the origin of this domain to the lineage leading to the ancestor of all living metazoans. The expression of NF-κB during A. queenslandica embryogenesis lends support to the proposition that this gene had an ancient role in development.

Materials and methods

Identification of ANK repeat and RHD-containing genes

Genomic draft assemblies, traces and expressed sequence tag (EST) databases of A. queenslandica, the placozoan Trichoplax adhaerens and the choanoflagellate Monosiga brevicollis were generated as part of a collaborative genome projects with the Joint Genome Institute (http://genome.jgi-psf.org/euk_home.html) and are publicly available on http://www.ncbi.nlm.nih.gov/. To identify candidate genes containing a RHD and ANK repeats in A. queenslandica, a tBLASTn algorithm was used to search assemblies, traces, and ESTs for similarity with the ANK repeat and the RHD motif consensus. Selected A. queenslandica traces were assembled using an in-house assembly pipeline (Larroux et al. 2007). The open reading frames (ORFs) of the putative NF-κB were predicted from genomic sequences using the available EST sequences and GENSCAN splice site prediction program (http://www.genes.mit.edu/). A primary genome assembly from the Joint Genome Institute was later consulted. Because ESTs corresponding to A. queenslandica NF-κB gene were truncated, RACE PCRs were performed using the SMART kit (Clontech) method to generate full-length cDNA and confirm our predictions (gene-specific oligonucleotide primer sequences available upon request).

Phylogenetic analyses and domain organization

The derived amino acid sequence of AmqNF-κB RHD was aligned to a selection of 21 other RHD-containing proteins (Fig. S1a–b), whose accession numbers are given in Table S1. The alignments were extended to the NLS, situated upstream of the RHD, because it is highly conserved and closely associated with the RHD. Two separate phylogenetic analyses were performed on AmqNF-κB RHD. Because no NFAT match was detected in A. queenslandica, the first analysis was run on an alignment that only limited itself to highly conserved characters of the domain (300 characters, see Fig. S1a) and therefore did not include members of the NFAT family; the RHD of NFATs is divergent compared to Rel/NF-κB members of the RHD-containing family (Huguet et al. 1997; Graef et al. 2001). To further resolve the relationship of AmqNF-κB to other RHD-containing proteins, a second analysis was performed that used an alignment (Fig. S1b) based on the one published in Sullivan et al. (2007), which included the divergent NFAT sequence as an outgroup.

A separate alignment was also performed on the ANK repeats of AmqNF-κB and 10 other NF-κB- and IκB-related proteins (214 characters). Non-IκB ANKs were used as an outgroup. Only the first six repeats were used because the seventh repeat is not present in IκBs (Fig. S1c). All alignments were perfomed using ClustalX 1.64b (Thompson et al. 1994). They were then edited visually in the Sequence Alignment Program Se-Al v1.d1, Sequence Alignment Editor (available at http://evolve.zoo.ox.ac.uk), and regions of uncertain alignment were removed.

Distance and parsimony were performed using the PHYLIP v3.6 package (Felsenstein 2003). Distance neighbor joining (NJ) analyses with 1,000 bootstraps were performed using Seqboot, Protdist, Neighbor, and Consense with default settings. Parsimony analyses with 1,000 bootstraps were performed using Seqboot, Protpars, and Consense with default setting. Bayesian analyses were performed as per Larroux et al. (2006) but for one million generations. Intron–exon boundaries and domain organization were evaluated using the program Gene Structure Draw (available at http://warta.bio.psu.edu/cgi-bin/Tools/StrDraw.pl).

Whole mount in situ hybridization

Adult specimens of the sponge A. queenslandica (Porifera, Demospongiae, Haplosclerida, Niphatidae) were collected on Heron Island Reef, Great Barrier Reef, Australia as described in Leys and Degnan (2001, 2002). Single-probe in situ hybridization was performed as described in Larroux et al. (2006). Three different probes spanning different conserved areas of the gene were used. Detailed protocol and probe details are available upon request.

Results and discussion

Identification of ANK repeat and RHD-containing genes

We did not detect any RHD in the genomes of the placozoan T. adhaerens and the choanoflagellate M. brevicollis. Multiple ANK repeat domains were detected in T. adhaerens and M. brevicollis, but none of these displayed significant similarity to those found in metazoan NF-κB and IκB genes. This strongly suggests that the placozoan and the choanoflagellate do not have members of the Rel/NF-κB and NFAT families. On the other hand, we identified a gene with a RHD and an IkB-like ANK repeat domain in the ESTs and genome traces of A. queenslandica. The full-length cDNA sequence amplified by RACE confirmed the Genscan genomic/mRNA sequence predictions. We found no evidence for the presence of these domains in other genes amongst the A. queenslandica ESTs, genome assemblies, and traces, implying that A. queenslandica only possesses a single representative of the NF-κB/IkB gene families.

The sponge NF-κB gene is very similar to human NFκBs

We determined that the A. queenslandica NF-κB gene (AmqNF-κB) encodes a conceptual 1,095 amino acid protein (Fig. S2). The AmqNF-κB domain organization is very similar to human NF-κB1 in that it contains a short, 47-amino acid N-terminal sequence, a 312-amino acid RHD, a glycine/serine-rich region (that serves as a processing signal for the generation of p50), seven ANK repeats, a potential PEST domain, and a DEATH domain (Fig. 1). AmqNF-κB possesses an extra stretch of amino acids after the glycine–serine-rich region that makes it a longer protein than human NF-κB 1 and 2 (Fig. 1b). While AmqNF-κB is strikingly similar to the human orthologue, it differs from the N. vectensis NF-κB (NvNF-κB), which does not possess ANK repeats (Figs. 1 and 2).

Fig. 1
figure 1

a Protein sequence of A. queenslandica NF-κB and b domain architecture of A. queenslandica NF-κB and human NF-κB 1. The Rel domain (RHD) is divided into RHD1 (DNA-binding specificity) and RHD2/IPTG (dimerization domain); NLS nuclear localization signal

Fig. 2
figure 2

Comparison of the exon architecture of different members of the RHD and ANK repeat containing genes across phyla. Exons are aligned and the introns are not drawn to scale. Black stars are placed over splice sites (5′ end of downstream exon) in exons encoding the RHD, NLS, and ANK repeats that AmqNF-κB shares with other AmqNF-κB/Rel and IκB members. Gray stars show conserved splice sites not found in A. queenslandica. Amq A. queenslandica, Dm Drosophila melanogaster, Hs Homo sapiens, Nv N. vectensis. See Figs. 3 and S3 for sequence alignments

The AmqNF-κB RHD shares a number of key features with Rel and NF-κB proteins, including: (1) a highly conserved DNA recognition loop sequence (aligned residues 12–21), (2) a conserved CDKVQK sequence (aligned residues 259–264), and (3) a basic nuclear localization sequence (aligned residues 357–361) (Fig. 3). AmqNF-κB shares a stretch of approximately 35 amino acids with the NF-κB group RHDs that is absent in the Rel and NFAT subfamilies (aligned residues 126–172, Fig. 3). The protein kinase A recognition serine is also present in AmqNF-κB (aligned residue 330, Fig. 3). AmqNF-κB has retained the highly conserved redox sensitive cysteine in the DNA-binding loop (aligned residue 20, Fig. 3), contrary to NvNF-κB and Drosophila Relish, which harbor a serine instead. In p52, DNA binding is enhanced when this specific residue is reduced and the cysteine is also required in vitro for NF-κB to maintain DNA binding specificity (Matthews et al. 1992, 1993).

Fig. 3
figure 3

Comparison of the amino acid sequence of AmqNF-κB Rel domain with other members of the Rel family. Codons that incorporate an intron–exon boundary are shown in red. Abbreviations are as in Fig. 2

Analysis of the genomic contig containing AmqNF-κB reveals that the entire gene is dispersed over 10.3 kb and is composed of 25 exons, compared with 24 in human NF-κB1 gene and 23 in NF-κB2 gene (Fig. 2). Exons range from 41 to 441 bp and introns from 47 to 1,084 bp in length. The intron–exon organization is very conserved in the Rel domain, with AmqNF-κB sharing nine intron sites with its human and N. vectensis orthologues. However, AmqNF-κB ANK repeats span eight introns compared to seven in human NF-κB1 and 2. This is not surprising because it has been shown that the ANK repeats evolve faster than the RHD, mainly due to the strong constraint of the DNA binding role of the RHD (Huguet et al. 1997). Interestingly, the extra intron–exon boundary in AmqNF-κB (between exons 5 and 6) is present in some IκBs (Figs. 2 and S3).

Our first phylogenetic analyses (parsimony, distance, and Bayesian) were performed on a RHD alignment, which only contained phylogenetically conserved characters and did not include an outgroup. However, the unrooted tree we obtained did not resolve the relationship between AmqNF-κB and other RHD-containing proteins (Fig. 4a). A second analysis, which used a less-stringent alignment with NFAT as an outgroup, placed AmqNF-κB separate to the early diverging Relish, at the base of the NF-κB/Rel clade (Fig. 4b). Although the bootstrap support was significant in all analyses performed, this result may be misleading as the outgroup may force the long branches, such as the Relish sequences, to strongly influence the topology of the ingroup. When the same analyses were run, excluding Relish, the position of AmqNF-κB was indeed no longer resolved, lending further support to long-branch attraction artifacts (data not shown).

Fig. 4
figure 4

Phylogenetic relationships between metazoan RHD and ANK-repeat containing proteins by distance, parsimony, and Bayesian analyses. Only the neighbor-joining phylogenetic trees are shown here. a An unrooted phylogram of the RHD-containing proteins. b A rooted phylogram of RHD-containing proteins, with NFAT as an outgroup. c A rooted phylogram of ANK-repeat containing proteins with non-IκB ANKs as outgroups. Families and higher-level groupings are shown at the right of the tree. Percentage of bootstrap support (1,000 replicates) greater than 50% are given at key nodes (in blue, parsimony, in black, neighbor-joining). An asterisk indicates high Bayesian support (posterior probability greater than or equal to 95%). Sponge genes are in red, cnidarian genes in blue. Abbreviations as in Fig. 2. Ag Anopheles gambiae, Cg Crassostrea gigas, Ci Ciona intestinalis, Dr Danio rerio, Hr Halocynthia roretzi, Rv Avian reticuloendotheliosis virus, Sp Strongylocentrus purpuratus

The phylogenetic analyses of the ANK repeat domain grouped AmqNF-κB in the NF-κB clade. The bootstrap support was significant in Bayesian analyses (>95%) but low in both the neighbor-joining and parsimony analyses (≤30%) (Fig. 4c). Therefore, the combined phylogenetic analyses lend further support to AmqNF-κB being a member of the NF-κB family.

AmqNF-κB is developmentally expressed

Because the NF-κB signaling system has a developmental role in both vertebrates and insects (reviewed in Hayden and Ghosh 2004), we assessed the expression of AmqNF-κB during embryogenesis and larval development by whole-mount in situ hybridization. During A. queenslandica development, cleavage produces a bilayered embryo comprising multiple cell types. Following cleavage, small pigment cells form a spot at the posterior pole, and subsequently, these pigment cells migrate outwards to form a ring responsible for photosensitivity and directional larval movement (Leys and Degnan 2002; Adamska et al. 2007a).

AmqNF-κB transcripts are first detected broadly throughout embryos that have completed cleavage. Expression is notably strong in large granular cells (Fig. 5a–c). In early- and late-spot-stage embryos, AmqNF-κB expression specifically limits itself to these granular cells, which migrate to the outer layer (Fig. 5d,g), and can also be seen within the posterior pigment spot (Fig. 5d–i). In late-ring-stage embryos, AmqNF-κB can still be detected in granular cells populating the middle cell layer (the subepithelial layer) and the outer epithelial layer (Fig. 5j–l). Transcripts are also present in flask cells (Fig. 5k–l), which are large ciliated cells that are interspersed amongst the columnar epithelium and express a range of genes whose orthologues play a role in eumetazoan neurons (Sakarya et al. 2007). Compared to late ring stages, the larva shows a very similar expression pattern, but transcripts are absent altogether from the outer epithelial layer and more punctuate in the flask cells (Fig. 5m–o).

Fig. 5
figure 5

Expression of A. queenslandica NF-κB in embryos and larvae. ac Postcleavage stage, df early spot stage, gi late spot stage, jl late ring stage, mo swimming larval stage. al, o Histological sections. m, n Whole mounts. b, c, e, f, h, i, k, l Magnification of granular cells for each embryological stage; some circled with dashed line. EL epithelial layer, FC flask cells, GC granular cells, ICM inner cell mass, MFC migrating flask cells, PR pigment ring, PS pigment spot, SEL subepithelial layer. Scale bars: a, g, j, m, 150 μm; b, e, f, h, i, n, 25 μm; c, k, l, 10 μm; d, 125 μm; o, 200 μm

Based on the localized and dynamic expression of AmqNF-κB during A. queenslandica development, it appears that this NF-κB is playing a developmental role, although there is no evidence to support it having a homologous role to its bilaterian orthologues. Because the NF-κB signaling pathway is also involved in immunity, it will be intriguing to further test the functionality of AmqNF-κB to establish whether it plays such a role in A. queenslandica. Because sponges are known to possess a complex immune system (see review by Müller et al. 1999) and, more specifically, genes known to interact with NF-κB (Wiens et al. 2005, 2007), it is possible that NF-κB also has an immune function in A. queenslandica.

Evolution of NF-κB

ANK repeats were detected in all the screened genomes (the placozoan T. adhaerens, the choanoflagellate M. brevicollis, and the sponge A. queenslandica), and they are known to be present in plants, fungi, and other sponges (Bork 1993; Müller et al. 2001). In contrast, RHD could not be detected in genomes of representative fungi, M. brevicollis or T. adhaerens. The presence of a single RHD in A. queenslandica reveals that this is likely to be a metazoan-specific innovation that subsequently combined with the more ancient ANK repeats by a currently unknown mechanism to give rise to the ancestral member of the Rel/NF-κB family. The origin of this ancestral NF-κB predates the divergence of sponge, cnidarian, and bilaterian lineages (Fig. 6). Because the Demospongiae is a lineage branching earlier than the Cnidaria, the domain arrangement observed in NF-κB of N. vectensis, with only the RHD present, could have resulted from a secondary loss of the ANK repeat domain (Fig. 6). Domain shuffling during early metazoan evolution has indeed been shown to contribute to the generation of metazoan protein diversity (Adamska et al. 2007b).

Fig. 6
figure 6

Evolutionary model for the evolution of RHD and ANK-repeat containing genes based on phylogenetic analyses and sequence alignments. In this model, a stepwise expansion of RHD-containing genes is shown. Some of these events could have occurred before metazoan cladogenesis, with gene loss occurring in the sponge lineage. Proteins and domains are not drawn to scale. TAD transactivation domain, G glycine, P proline, S serine

No IκB match was found other than the C-terminal of AmqNF-κB, which suggests that a single ancestral NF-κB that encoded a self-regulating IκB at its 3′ end was the basic requirement for the pathway to be established. It will be necessary to determine whether AmqNF-κB functions like p105 and p100 and is proteolytically processed to be activated. While NFATs were not identified in A. queenslandica, they are present in N. vectensis. Therefore, we infer the NFATs evolved from a duplicated RHD–ANK repeat ancestor, which subsequently lost the ANK repeats. Based on the lack of evidence of an NFAT orthologue in A. queenslandica, we suggest that this occurred in the period after the divergence of sponge lineage from the main eumetazoan lineage (Fig. 6).

Because Rel-like proteins are absent from both A. queenslandica and N. vectensis, but cnidarians possesses IκB, it is unclear at what stage these proteins evolved. It is possible that a duplication followed by a scission event gave rise to both Rel factors and IκBs prior to the Cnidaria–Bilateria split and Rel was secondarily lost in N. vectensis. Another possibility is that IκBs and Rels arose independently. It is interesting to note that a recent study established that NF-κB has either been lost or diverged beyond recognition in Hydra (Miller et al. 2007), suggesting that it will be quite difficult to retrace this evolutionary step. Nonetheless, it will be worth establishing what the representatives of the Rel/NF-κB and IκB families are in other cnidarians (coral, jellyfish, etc.), as well as in the ctenophores, to confirm which is the most likely scenario.

Finally, we cannot exclude the possibility that the metazoan ancestor possessed multiple RHD-containing genes, including possibly proto-Rel and proto-NFAT genes, and other genes containing IκB-like ANK repeats, and that these genes were lost in the sponge lineage leading to A. queenslandica. As the trees generated in this study (Fig. 4) for both RHD and ANK repeats are not particularly well-resolved or -supported, we cannot find phylogenetic evidence for gene loss. Reduced membership of Rel/NF-κB and NFAT families in A. queenslandica (i.e., just AmqNF-κB) is similar to that observed in other transcription factor gene families and classes, including ANTP, PRD, POU, LIM, and TALE homeoboxes; basic helix–loop–helix; Sox; T-box; and Fox (Larroux et al. 2007; Simionato et al. 2007; Larroux and Degnan unpublished data). More extensive sampling of basal metazoan taxa will help to resolve the composition of the ancestral metazoan genome.