The vertebrate immune system employs a wide variety of antigen-specific receptors - the immunoglobulins and T-cell receptors - to recognize and neutralize foreign invaders. The receptor diversity necessary to recognize an almost limitless universe of potential pathogens is created by a site-specific DNA rearrangement process termed V(D)J recombination. This unique reaction assembles the receptor genes from separate V, D and J gene segments, a process ostensibly restricted to lymphocytes at a certain stage of their development. In this view, the prediction is that the germline DNA of a given organism should contain unrearranged receptor genes; rearranged versions of the genes should exist only in lymphocytes. This prediction was satisfied by examination of germline and lymphocyte DNA samples from a variety of familiar vertebrates, including mice, humans, birds, and common farm animals [1,2,3]. The fly in the ointment, however, appeared with an analysis of receptor genes in the germline of evolutionarily distant organisms - the cartilaginous fishes. In these primitive vertebrates, many of the immunoglobulin genes are actually found pre-rearranged in the germline (reviewed in [2,3]). This suggests that the V(D)J recombinase may actually be an evolutionary force, a notion that is strongly supported by recent studies in the nurse shark [4]. The evolutionary consequences of these site-specific germline gene rearrangements may reach far beyond the immune system.

The basics of V(D)J recombination

The recombination machinery recognizes DNA sequences called recombination signal sequences (RSSs) adjacent to each gene segment. Each RSS consists of conserved heptamer and nonamer motifs separated by 12- or 23-nucleotide 'spacer' sequences. The recombinase is made up of two proteins, RAG-1 and RAG-2, which, in conjunction with the non-specific DNA-bending proteins, HMG-1 or HMG-2, recognize the RSS and catalyze site-specific DNA cleavage (see [5,6] for review). Cleavage requires both RAG-1 and RAG-2, and co-expression of these proteins is thought to be limited to developing lymphocytes ([7], and reviewed in [6]). As illustrated in Figure 1, the RAG proteins introduce a double-strand DNA break precisely between the V, D or J coding sequence and the RSS, generating two types of DNA ends: blunt signal ends (which terminate in the RSS) and covalently sealed (hairpin) coding ends (which terminate in the V, D or J element). After cleavage, the two signal ends are joined, producing a signal joint. Prior to joining the coding ends, the hairpins must be opened; joining generates a coding joint that may have lost or gained nucleotides.

The precise details of the end-processing and joining reactions remain obscure, and are not important for this story; it is clear, however, that multiple, non-lymphoid-specific DNA repair proteins are involved. One detail that is important is that the opening of the hairpins frequently occurs 'off center', leading to palindromic single-stranded tails. Joining of these ends can give rise to a characteristic signature in the completed junction: a palindromic, or P nucleotide, insertion [8,9]. Another type of junctional insertion, N (non-templated) nucleotides, are added randomly by the enzyme terminal deoxynucleotidyl transferase. Thus, coding joints formed by V(D)J recombination have several distinguishing characteristics, including variable loss of nucleotides and the frequent presence of either N nucleotides, P nucleotides, or both [5].

V(D)J recombination bears many striking parallels to the movement of certain transposable elements, both in its general form and in important mechanistic details of the reaction (reviewed in [10,11]). In fact, recent biochemical experiments have shown that purified RAG proteins can catalyze transposition in the test tube, integrating a DNA fragment bearing signal ends into a target duplex (Figure 2) [12,13]. This reaction does not require specific DNA sequences in the target and, like many transposition reactions, creates a characteristic 'footprint': upon integration, three to five nucleotides of target DNA are duplicated on either side of the transposon. It should be noted that there is, as yet, no firm evidence that RAG-mediated transposition events can occur in living cells.

Surprises from sharks and skates

Early on, it was suggested that the V(D)J recombination system might have arisen by the fortuitous integration of a transposable element into an ancestral antigen-receptor gene [14]. This hypothesis was strengthened by the discovery that the RAG genes are tightly linked [7], and by the finding that the RAG proteins can act as a transposase. Thus, a plausible model for the acquisition of the V(D)J recombination system during vertebrate evolution is the integration of a transposable element carrying the linked RAG genes into a primordial antigen-receptor gene in an ancestral jawed vertebrate, approximately 450 million years ago (reviewed in [1,11]). Presumably, this initial integration event created the first rearranging antigen-receptor gene; subsequent gene duplication events then created the multiple immunoglobulin and T-cell receptor loci.

To learn more about the evolutionary origins of the combinatorial immune system, several laboratories have characterized antigen-receptor loci from a wide variety of species, including the cartilaginous fishes - the living jawed vertebrates most phylogenetically distant from mammals. The immunoglobulin loci of sharks and skates contain some fairly typical antibody genes, with multiple V, D and J elements. Surprisingly, however, many of the immunoglobulin genes are already partially or fully rearranged in the germline (see [2,3] for review). Recently, pre-rearranged immunoglobulin genes were also found in a teleost fish, the channel catfish [15].

How did these pre-rearranged genes arise? One possibility is that they are descendants of the ancestral antigen-receptor gene, before integration of the putative transposable element. A second possibility is that these genes arose from RAG-mediated DNA rearrangement events that occurred in the germline, an operation that violates the precept that the RAG recombinase is functional only in developing lymphocytes. A recent paper by Lee et al. [4] addresses these questions by examining immunoglobulin genes in the nurse shark. The nurse shark NS4 immunoglobulin light chain gene family provided very useful information, as there are several highly homologous genes present both in pre-rearranged and unrearranged forms. These features allowed the authors to evaluate sequences in sufficient detail to ascertain whether the genes bear characteristic features of V(D)J recombination or footprints of transposition. Their analysis revealed that the pre-rearranged genes did indeed contain tell-tale signs of coding joints formed by V(D)J recombination, including both N nucleotides and P nucleotides, which strongly suggest a hairpin intermediate. The presence of these features in several junctions suggests several independent germline V(D)J recombination events, although analysis of multiple unrelated individuals suggests that these events are not frequent. Importantly, analysis of the unrearranged NS4 genes failed to detect the target site duplications that are hallmarks of transposon insertions. Thus, while the pre-rearranged NS4 genes appear to have been derived from unrearranged genes by germline V(D)J recombination, there is as yet no evidence to support the hypothesis that the unrearranged genes were derived from the pre-rearranged genes by insertion of a transposable element; this is discussed in a recent review by Lewis and Wu [16].

The recent studies of the nurse shark NS4 genes strongly suggest that V(D)J recombination events can occur in the germline. Moreover, phylogenetic analysis indicates that these events occurred recently (at least from an evolutionary perspective), some time within the last 7 million years. What evolutionary benefit might there be in germline recombination events? Pre-rearranged immunoglobulin genes may confer certain advantages over genes that must be assembled by recombination in individual lymphocytes. For example, pre-rearranged genes may encode receptors capable of recognizing common pathogens likely to be encountered during the neonatal period, before the development of a full repertoire of rearranged antigen receptors [16]. Furthermore, germline joining could have contributed to evolution of gene segment clusters, and possibly the evolution of D segments [16].

Could the V(D)J recombinase aid the generation of evolutionary diversity on a genome-wide scale?

The results described above raise a number of intriguing questions. Are the RAG proteins normally expressed during germ cell development? To my knowledge, co-expression of both RAG proteins outside the lymphoid system has not been reported, but RNA species encoding RAG-1 and RAG-2 have been detected in zebrafish ovary and Xenopus oocytes, respectively [17,18]. Even if the recombinase is not normally expressed in these tissues, though, inappropriate expression might occur occasionally, perhaps as a result of improper reprogramming of tissue-specific gene expression during development. Such rare events could underlie the apparently infrequent germline rearrangements of the immunoglobulin loci.

If RAG expression does occur during the development of germ cells, another important question arises: how are loci chosen for rearrangement? During normal B-lymphocyte differentiation, immunoglobulin loci undergo a carefully orchestrated series of rearrangements. D to J rearrangements of the heavy-chain genes occur first, followed by V to DJ rearrangements, followed in turn by rearrangement of the light-chain genes. T-cell receptor gene rearrangements in developing T lymphocytes follow a similar pattern. The carefully ordered sequence of rearrangements is critical for proper lymphocyte differentiation and is thought to be controlled by accessibility of the loci, mediated by alterations in chromatin structure (reviewed in [5,19]). RSSs present in antigen-receptor loci that have not been 'targeted' for rearrangement are used rarely, if at all.

The careful control of locus accessibility seen in lymphocyte differentiation may not be recapitulated in the development of germ cells - after all, these cells are not supposed to be expressing RAG recombinase activity. Furthermore, the V(D)J recombinase is not particularly picky about the sequences it can target for rearrangement. Lewis and co-workers have found that 'cryptic' sites capable of supporting rearrangement occur at least once in every 600 base pairs of a commonly used plasmid sequence, and they have estimated that there are at least 10 million such sites in the mammalian genome [20]. In fact, there is evidence that cryptic sites in the mammalian genome can serve as targets for RAG-mediated rearrangement in lymphocytes (reviewed in [5]). Thus, it is possible that RAG expression during germ-cell development might cause rearrangements of regions of the genome far removed from the immune receptor loci. These considerations suggest that the V(D)J recombinase might have been (and could still remain) a significant force shaping vertebrate evolution, by catalyzing V(D)J-like rearrangements and, perhaps, transposition. Comparison of genome sequences from a variety of organisms may allow some aspects of this notion to be tested.

Figure 1
figure 1

V(D)J recombination occurs in several steps. First, the RAG proteins bind to the RSSs (triangles) and bring them together into a synaptic complex. Cleavage ensues, generating a pair of blunt signal ends and a pair of DNA hairpin coding ends. Joining of these ends generates signal and coding joints, respectively. The boxes represent V, D or J coding elements.

Figure 2
figure 2

Transposition catalyzed by the RAG proteins. A fragment of DNA generated by RAG-mediated cleavage, with RAG proteins bound to the signal ends (the donor), can capture another DNA duplex (the target). The RAG proteins bound to the signal ends catalyze integration into the target, generating a characteristic duplication of the target sequence at the integration site (arrowheads). Other symbols are as described in Figure 1.