Background

The germ cell nuclear factor (GCNF, NR6A1) is a member of the nuclear receptor superfamily [1,2]. Originally isolated from mouse cDNA libraries, homologs of GCNF have been identified in humans, frogs and fish [3,4,5,6]. As no ligand has been identified, GCNF is designated an orphan receptor. Also known as RTR (retinoid acid receptor-related testis-associated receptor) or NCNF (neuronal cell nuclear receptor), evolutionary studies have defined GCNF as the only known member of a sixth subfamily of nuclear receptors [7,8,9]. The mouse Gcnf gene is highly expressed in the developing nervous system, in the labyrinthine layer of the placenta and in the developing germ cells [8,10,11,12]. Two transcripts of approximately 7.5 kb and 2.4 kb are present in testis, but only the larger transcript is found in somatic cells. Hybridization experiments reveal that the size difference is at least partially due to the use of different polyadenylation sites [13]. Interestingly, GCNF expression is transiently up-regulated and later down-regulated again when embryonal carcinoma cells are triggered to differentiate by retinoic acid [14,15,16].

Results and discussion

We have isolated genomic clones encompassing the mouse Gcnf gene, and have defined the intron-exon structure of the gene. Sequence analysis reveals that the coding region of Gcnf comprises 11 exons and 10 introns (Table 1). A bacteriophage lambda library and a cosmid library of genomic DNA of the mouse 129 strain were screened with the full-length GcnfcDNA. The DNA from colonies that hybridized was cloned into pBluescript (SK) for further sequence analysis. Exons 3 and 4 were identified from bacteriophage subclones, and exons 6-11 were identified in cosmid-derived subclones. Additional intron-exon boundaries and the 5'-untranslated region (5'-UTR) were identified by genome walking analysis following the manufacturer s instructions (Clontech). DNA sequencing was performed on an ABI 377-sequencer using the dye terminator protocol (Perkin Elmer) and on a DNA sequencer model 400 (Li-Cor). The DNA sequences were processed using the Wisconsin Package Version 10.0 of the Genetics Computer Group (GCG), Madison, Wisconsin.

All intron-exon junctions obeyed the GT/AG rule ([17] and Table 1). The location of the intron-exon junctions relative to the peptide sequence is shown in Figure 1. The translational start and stop codons are on exons 1 and 11, respectively. Exon 1 contains the 244 bp untranslated sequence at the 5' end of the cDNA and codes for the first 33 amino acids (Figure 2). This cDNA, isolated by Hirose et al. ([7]; GenBank entry MMU09563), starts with an EcoRI site that is present in the genomic DNA. The T at position 174 is a G in our genomic isolate, which could represent a genomic variant. As no promoter has been identified for Gcnf, the sequence preceding the EcoRI site may contain promoter elements. It is also possible, however, that the promoter precedes a not-yet-identified additional exon in the 5'-UTR of Gcnf.

The amino-terminal domain of 75 amino acids is encoded by exons 1-4. Exon 4 also codes for the core DNA-binding domain (DBD) of 66 amino acids and for three additional amino acids (Figure 1). The DBD consists of two zinc-finger motifs that are encoded by separate exons in most vertebrate nuclear receptor genes, except for those of the COUP transcription factor subfamily. Evolutionary studies do not provide further evidence that these receptors are closely related to GCNF. A further domain important for DNA binding and for homodimeric interactions, and known as the DBD carboxy-terminal extension, is encoded by the 56 bp of exon 5. The sizes of intron 2 and intron 4 were determined by PCR amplification of mouse genomic DNA. Exons 6 and 7 code for the hinge region, whereas exons 7-11 code for the putative ligand-binding domain. A variant of the typical AUAAA polyadenylation signal (AGUAAA) and the cleavage site that is used in the testis are part of the eleventh exon [13].

Conclusions

The protein-coding region of GCNF is contained in 11 exons. Additional studies will be required to define the regulatory/promoter region. We think the genomic structure of this first, and at present only, member of the sixth subfamily of nuclear receptors will be useful for further studies of this unique receptor.

Figure 1
figure 1

The location of the different exons in the GCNF amino-acid sequence. The core DNA-binding domain is underlined.

Figure 2
figure 2

Sequence of exon 1 of Gcnf. The location of the EcoRI site (GAATTC) marking the 5'-end of the Gcnf cDNA (GenBank entry MMU09563) and the putative translational start codon (ATG) are underlined.

Table 1 Organization of the mouse Gcnf gene