T lymphocytes are divided into two subpopulations, αβ and γδ T cells, which express distinct heterodimeric T cell receptors (TR) containing either α and β, or γ and δ chains, respectively. TR variable domains are generated during lymphocyte development as a consequence of rearrangements between variable (V), diversity (D), and joining (J) genes for β and δ chains, and between V and J genes for α and γ chains. After transcription, the V–(D)–J sequence is spliced to the constant (C) gene.

The genomic organization of the locus encoding the TR gamma chain has been described in human, mouse, and dog where γδ T cells account for a small proportion of the peripheral T cells (“γδ low” species) (Janeway et al. 1988; Faldyna et al. 2005).

The human TRG locus spans about 150 kb and comprises 14 TRGV genes, of which only six are functional, positioned upstream of two J–C clusters, each one of them being made of three or two TRGJ genes and a single TRGC gene (Lefranc et al. 1989; IMGT/GENE-DB, Giudicelli et al. 2005; IMGT Repertoire, http://www.imgt.org, Lefranc et al. 2009). Conversely, the mouse TRG locus features four V–J–C recombination units (cassettes) spanning about 180 kb. The TRGC1 cassette has four TRGV genes, all of which are functional, one TRGJ, and one TRGC gene; the TRGC3, TRGC2, and TRGC4 cassettes, all consist of one TRGV, one TRGJ, and one TRGC gene. However, the TRGC3 cassette is not functional since the TRGC3 is a pseudogene. The TRGC2 cassette is inverted in the locus with respect to the other three cassettes (Vernooij et al. 1993; IMGT Repertoire, http://www.imgt.org, Lefranc et al. 2009). In both human and mouse loci, enhancer elements that control the general accessibility of the locus and located at the 3′ end of the human TRG locus and at the 3′ end of all mouse V–J–C cassettes have been identified (Spencer et al. 1991; Hettmann and Cohen 1994).

Recently, the high-quality draft genome sequence of the domestic dog has allowed us to infer, for the first time in a mammalian species belonging to Carnivora, the genomic structure of the TRG locus (Massari et al. 2009). The dog TRG locus spans about 460 kb, and it is organized into eight in tandem aligned gene cassettes, with the same transcriptional orientation, each containing the basic recombinational unit V–J–J–C, with the exception of the most downstream cassette which lacks the TRGV genes. The structural organization includes 16 TRGV genes, 8 of which are pseudogenes, 16 TRGJ, 8 TRGC genes, 8 enhancer elements located at the 3′-end of each cassette, and 1 additional enhancer between the TRGV2-4 and TRGV4-1 genes in the TRGC5 cassette.

The genomic organization of the TRG locus has also been determined in “γδ high” species as ruminants (Hein and Mackay 1991). In sheep, as in dog, the TRG locus is characterized by a reiterated duplication of V–J–J–C cassettes that lie in two distinct paralogous loci on a same chromosome (Massari et al. 1998). In particular, the sheep TRG1 locus, containing TRGC5, TRGC3, and TRGC1 cassettes, spans about 160 kb and maps on 4q31, while the sheep TRG2 locus with TRGC6, TRGC2, and TRGC4 cassettes is 95 kb long and lies on 4q15–22 (Miccoli et al. 2003; Vaccarelli et al. 2005). Thirteen TRGV genes are distributed in the six cassettes, two of them are pseudogenes both localized within the TRGC5 cassette. Six enhancer elements downstream of the TRGC genes have been also recognized. Three additional enhancer elements, two in between the TRGV genes of the TRGC3 cassette and one upstream of the TRGV5-1 gene within the TRGC1 cassette, have been also identified (Vaccarelli et al. 2008).

The different organization of the TRG locus among species is a remarkable incentive in characterizing it in other species, especially in organisms that display high peripheral blood γδ T cell percentages. The rabbit is another γδ high mammalian species within Lagomorpha (Sawasdikosol et al. 1993) and has proved to be an invaluable model in immunological research, including vaccine development and resource for diagnostic and therapeutic antibodies. Annotation and analysis of the rabbit genome are therefore of importance for biomedicine and of special significance to immunologists. The Broad Institute (www.broadinstitute.org) has submitted the second whole genome assembly (OryCun2.0) of the European rabbit (Oryctolagus cuniculus), completed at 6.51× coverage, to GenBank (BioProject ID: 42933). We employed this genome assembly to identify the TRG locus in this species. We retrieved directly from the chromosome 10 genomic scaffold (GenBank ID: NW_003159273; pos. 18444960–18669910) the sequence comprising the amphiphysin (AMPH) and the related to steroidogenic acute regulatory protein D3-N-terminal like (STARD3NL) genes that flank, respectively, the 5′ and 3′ end of all mammalian TRG loci studied so far (Glusman et al. 2001; Massari et al. 2009). All TRG genes within the genome sequence were identified and annotated using both the human TRG (Supplementary Table 1) as a reference sequence and the rabbit cDNA collection by Isono et al. (Isono et al. 1995). Also, the manual searching of the recombination signal sequences was used. In this way, we identified the rabbit TRG locus in a region of about 70 kb, containing ten TRGV genes upstream of two TRGJ genes and one TRGC gene (Supplementary Fig. 1). A possible enhancer element was identified about 8 kb downstream of the last exon of the TRGC gene (GenBank ID: NW_003159273; pos. 18466382–18466632). The AMPH gene is 13 kb upstream of the first TRGV gene, while STARD3NL is 1 kb downstream of the enhancer-like region, in an inverted transcriptional orientation. On the basis of these structural remarks, we assume that the rabbit locus is the simplest and smallest among the mammalian TRG loci identified to date, and the organization resembles that of the human TRG except for the absence of the J–C duplication (Lefranc et al. 1989; Lefranc and Lefranc 2001; IMGT Repertoire, http://www.imgt.org).

To provide additional information on the genomic structure of the rabbit locus, we screened the TRG sequence, including STARD3NL gene, first, with the RepeatMasker program (Smit AFA, Hubley R, Green P RepeatMasker at http://repeatmasker.org) to highlight interspersed repeats and the compositional properties. The density of the total interspersed repeats results in 20.77 % [mainly short interspersed transposable element (SINE)] with a GC content of 42.40 % (Supplementary Table 2). The reduced size of the rabbit locus compared to that of the other mammalian species seems to be related not only to the low number of TRG genes, as result of a reduced number of duplicative events, but also to a low proportion of the LINE content (Supplementary Table 2). LINEs, predominantly LINE1 family, significantly contribute to the architecture of the TRG locus: they are the most predominant repeat elements in mouse (29.80 %) and dog (20.59 %), but they are understated in human (7.56 %) (Massari et al. 2009) and still more in rabbit (4.18 %). Conversely, SINEs are the most abundant elements in rabbit (14.78 %) (Supplementary Table 2) and human (13.95 %) with respect to dog (8.59 %) and mouse (2.85 %) (Massari et al. 2009).

The masked sequence was then aligned against itself using the PipMaker program (Schwartz et al. 2000; http://pipmaker.bx.psu.edu/pipmaker/) and the alignment expressed as a percentage identity plot (pip) (Supplementary Fig. 2). The pip shows the position of all TRG genes, the STARD3NL gene, the enhancer element identified in the region, and the location and orientation of all repetitive sequences. The occurrence of redundant lines, with a nucleotide identity >75 %, identifies the first seven TRGV genes as belonging to the same TRGV1 subgroup. Redundant lines, with a nucleotide identity <75 %, identify other two TRGV subgroups, classified as TRGV2 and TRGV3, each with a single member. The last TRGV gene, named TRGV4, is markedly different from all other TRGV genes, as it is evident by the absence of identity lines. Therefore, four different TRGV subgroups are present in rabbit, and they were classified in accordance with their genomic position. Two independent TRGJ genes (TRGJ1 and TRGJ2) and one TRGC gene, with exon 2 and exon 3 tightly related as indicated by redundant lines in the pip, are positioned downstream of the TRGV cluster. The position of all TRG genes in the rabbit genome together with their IMGT classification (Lefranc 2007, 2011a) is reported in Tables 1, 2, and 3. Sequence analysis showed that all rabbit TRG genes were functional according to the IMGT criteria (Lefranc 2011b).

Table 1 Description of the TRGV, genes in the rabbit genome
Table 2 Description of the TRGJ genes in the rabbit genome
Table 3 Description of the TRGC genes in the rabbit genome

The deduced amino acid sequences of the rabbit TRGV genes were manually aligned according to the IMGT unique numbering for the V-REGION (Lefranc et al. 2003) to maximize homology (Fig. 1a). All sequences exhibit the typical framework regions (FR) and complementarity determining regions (CDR) and the conserved amino acids: cysteine 23 (1st-CYS) in FR1-IMGT, tryptophan 41 (CONSERVED-TRP) in FR2-IMGT, and cysteine 104 (2nd-CYS) in FR3-IMGT. The TRGV genes belong to four different subgroups, and this is reflected in the amino acid changes, and in the different CDR-IMGT lengths, [5. 8. 4] for the TRGV1 subgroup genes, [8. 8. 4] for the TRGV2 subgroup, [8. 6. 7] for the TRGV3 subgroup, and [8. 4. 4] for the TRGV4 subgroup, these three subgroups being represented by a single gene. Interestingly, the TRGV1 genes show a difference in the length of the FR3 with three genes, TRGV1-2, TRGV1-4, and TRGV1-6, having an in-frame three-nucleotide deletion that corresponds to a D strand being shorter of one amino acid (position 81) compared to the other TRGV genes. The presence of the two distinct types of TRGV1 genes, alternately arranged within the locus, indicates that the duplication events have involved two rabbit ancestral TRGV genes that have undergone three in-tandem duplications, with TRGV1-7 gene having been duplicated earlier.

Fig. 1
figure 1

The IMGT Protein display of the rabbit TRGV (a), TRGJ (b), and TRBC (c) genes. The description of the strands and loops is according to the IMGT unique numbering for V-REGION (Lefranc et al. 2003) (a) and C-DOMAIN (Lefranc et al. 2005) (c)

The deduced amino acid sequences of the two TRGJ genes are reported in Fig. 1b. The two TRGJ are substantially different in length, and the FGXG amino acid motif, which characterizes the TRGJ genes, is present.

The amino acid sequence of the four TRGC exons was also deduced (Fig. 1c). The first exon (EX1) encoding for the extracellular region was described according to the IMGT unique numbering for the C-DOMAIN (Lefranc et al. 2005) and comprises 109 amino acids (AA), instead of 110, for the presence of a shorter FG loop (12 AA instead of 13). Two exons (EX2A and EX2B), of 20 AA each, encode for the first part of the connecting region, a situation that reminds the human polymorphic TRGC2 gene, where the same region is encoded by two exons of 16 AA each in TRGC2 (2×), and even by an additional third exon in TRGC2 (3×) (Buresi et al. 1989). A TLVG amino acid motif is repeated at the beginning of each of the two exons. The remaining part of the connecting region (16 AA), the transmembrane (24 AA), and the cytoplasmic region (7 AA) are encoded by the exon 3 (EX3).

We compared the rabbit TRG annotated sequence with those of human, mouse, sheep, and dog (Supplementary Table 1) by means of the PipMaker program. Dot plot matrix of the rabbit TRG locus against human (Supplementary Fig. 3a) proves that the similar genomic organization of these two loci implies a high sequence identity along the entire region. When considering TRGV genes only, the matrix shows a high level of nucleotide identity between rabbit and human TRGV1 subgroups both located in the 5′ part of the locus. Hereinafter, the human TRGV10–TRGVB–TRGV11 regions show identity with the area corresponding to the rabbit TRGV2–TRGV3 genes, with rabbit TRGV2 gene more similar to human TRGV10 gene and rabbit TRGV3 to human TRGVB–TRGV11 genes. A total lack of identity was evident for rabbit TRGV4 as well as human TRGV9 and TRGVA genes. Moreover, the matrix displays two identity diagonals in line with the single J–J–C cluster in rabbit and two in tandem J–J–C clusters in man, preserving the intergenic portions. The longest diagonal refers to the block JP2-J2-C2-En-STARD3NL region.

Differently, the rabbit/mouse dot plot matrix (Supplementary Fig. 3b) shows only short similar traits corresponding to the exon regions of the TRG genes with a longer identity line, even if interrupted, between the rabbit 3′ end and mouse J4-C4-En-STARD3NL regions. As regards the TRGV genes, the single mouse TRGV7 gene shows identity with the rabbit TRGV1 subgroup, the mouse TRGV4, TRGV3, TRGV2, and TRGV1 show identity with the rabbit TRGV2 and TRGV3 genes, and the mouse TRGV5 identifies rabbit TRGV4 gene.

When looking at the rabbit vs sheep and canine matrices, the importance of the J–J–C clusters in the evolution of the sheep and dog TRG locus emerges clearly. In the dot plot matrix obtained from the comparison between the rabbit TRG locus and the sheep TRG1 and TRG2 loci (Supplementary Fig. 3c), the most attractive feature is the presence of six identity diagonals at the level of the rabbit J–J–C block which account for the six duplications of the sheep J–J–C regions (three for each locus). In particular, the most continuous diagonal is at the J–J–C5 cluster within the sheep TRG1 locus. It is noteworthy that all sheep J–J–C clusters show an identity region at their 3′end with the last two exons of the STARD3NL gene. This result suggests a duplicative mechanism in sheep starting from the 3′end of the ancestral TRG locus and involving the entire V–J–J–C unit, the enhancer, and the last portion of the STARD3NL gene. The block of identity, interrupted in the C4 cassette of the TRG1, appears to be continuing in the TRG2 locus, pointing out the evolutionary split that has broken apart the sheep locus into two parts by a translocation event (Vaccarelli et al. 2008).

Concerning the TRGV genes, the rabbit TRGV1 subgroup shows identity with six sheep TRGV genes located in the C3 (three genes), C4, C1, and C2 cassettes, while the rabbit TRGV2–TRGV3 region identifies the sheep TRGV3-1–TRGV3-2–TRGV7 area in the C5 cassette. The rabbit TRGV4 gene shows identity with the sheep TRGV4 gene in the same cassette. Similarly, the dot plot matrix obtained from the rabbit and canine comparison shows eight identity diagonals related to the eight canine J–J–C duplications (Supplementary Fig. 3d). In all cassettes, the block of identity is interrupted, producing intermittent lines. As in sheep, the cassettes show longer similarity region at their 3′end corresponding to the last part of the STARD3NL gene, demonstrating that sheep and dog may have shared the same duplication mechanism. Surprisingly, a total lack of identity is evident among TRGV genes, with the exception of the rabbit TRGV3 gene that shows identity with dog TRGV3 and TRGV7 subgroups.

Definitely, we observed a substantial similarity between rabbit and human TRG sequences that is in line with the accepted phylogeny. Differently from human and rabbit, the mouse, sheep, and dog TRG locus structure occurs as multiple clusters. In these cases, the genes are distributed across hundred of kilobases of the genomic sequence. Out of line with the accepted phylogeny (Bininda-Emonds et al. 2007), a more consistent similarity was observed between rabbit and sheep rather than mouse and dog. Altogether, our comparative data support the idea that the structural organization of the TRG locus appears to have evolved independently in the different mammalian species and cannot explain by itself the “γδ low” condition of human, mouse, and dog with respect to the “γδ high” condition of sheep and rabbit.

The species-specific evolution of the TRG locus structural organization is confirmed when we performed phylogenetic analysis of the TRGV genes (Fig. 2). The coding nucleotide sequences of all available TRGV genes (from FR1-IMGT to FR3-IMGT) from rabbit, human, mouse, sheep, and dog were combined in the same alignment, and an unrooted phylogenetic tree was made using neighbor-joining (NJ) method. The tree shows that the TRGV genes can be distributed into four main sets labeled A through D (Fig. 2). Set A comprises the rabbit TRGV1 multimember subgroup with the related human TRGV1, mouse TRGV7, and sheep TRGV5, TRGV1, TRGV9, TRGV2, TRGV8 subgroups without any dog counterpart. It is noteworthy that the genes form species-specific sets, indicating that each of the mammalian species inherited an ancestral TRGV gene, which, with the exception of the single mouse TRGV7, underwent duplication events independently in each lineage. Moreover, if we look at the genome position of these genes within the different TRG loci (IMGT Repertoire, http://www.imgt.org), we can deduce the in-tandem duplication mechanism that seem to have involved only the TRGV genes in rabbits and humans, whereas it has affected even an entire V–J–C cassette in sheep. The sister set B lacks rabbit and mouse genes. It consists of six dog TRGV genes belonging to TRGV2, TRGV4, and TRGV6 subgroups, the human TRGVA pseudogene, and the sheep TRGV6-1 gene. The presence of the dog distinct subgroups in this branch indicates that they have emerged by recent duplication and divergent events in the dog lineage. Sets C and D both contain TRGV genes belonging to all mammalian species. It should be noted that the rabbit TRGV2, TRGV3, and TRGV4 genes are intermingled with the other mammalian genes so that possible orthologous genes can be identified.

Fig. 2
figure 2

Evolutionary relationships of TRGV genes. Multiple alignments of the sequences were carried out with the MUSCLE program (Edgar 2004). The evolutionary history was inferred using the NJ method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) is shown next to the branches. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method and are in the units of the number of base substitutions per site. All positions containing gaps and missing data were eliminated. Evolutionary analyses were conducted in MEGA5 (Tamura et al. 2011). The species are indicated by the standardized IMGT six-letter code (or nine-letter code for subspecies) derived from Latin names: Orycun for O. cuniculus, Musmus for Mus musculus, Oviari for Ovis aries, Homsap for Homo sapiens, Canlupfam for Canis lupus familiaris. The accession numbers of all sequences are reported in Supplementary Table 3

Our analysis substantiates the idea that a progenitor of four phylogenetic TRGV sets was present in the common ancestor of mammals and that the species have lost or duplicated genes that belong to the different sets, regardless of the γδ condition. Recently, the genomic organization of the TRG locus in the sandbar shark (Carcharhinus plumbeus) has been determined. It consists of at least five TRGV subgroups, distributed in four phylogenetic groups, three TRGJ, and one TRGC genes in about 30 kb (Chen et al. 2009). Based on these data, our idea is that the rabbit simple TRG structure may resemble a vertebrate ancestral arrangement pattern that consisted of four TRGV, two TRGJ genes, and one TRGC gene.