Background

Interferons (IFNs) are a family of secreted cytokines [1, 2] that exert their biological activities by binding to specific cell membrane receptors to trigger a well characterised intracellular signalling pathway [3, 4] culminating in the transcriptional induction of IFN stimulated genes (ISGs). It is through the ISGs that IFNs generate diverse cellular and physiological states involving antiviral, apoptotic, antiproliferative, antitumor and immunomodulatory activities [4]. Oligonucleotide arrays have been used to show that there are several hundred ISGs [5]. ISGs can be responsive to type I (α/β) IFNs, type II IFNs (γ) or both. The DNA motifs close to or within the ISGs that mediate these responses are the 14 nt IFN Stimulated Response Elements (ISREs) and the 9 nt GAS elements for type I and type II IFNs, respectively. Most ISGs code for proteins whose biochemical and cellular roles are either well understood (e.g the genes for RNA-dependent protein kinase PKR [6, 7], 2'-5' Oligoadenylate Synthetase [8, 9] and the genes of the MHC [10, 11]), or partially understood (e.g. the p202 genes [12], p56 [13], and the 1–8 family [14]). There remains some prominent ISGs, however, including 6–16 [15] and ISG12 [16] for which there are no known biochemical or cellular functions.

IFN is used in the treatment of several human diseases including Hepatitis C [17, 18] and multiple sclerosis [19]. Unfortunately, IFN treatment can have unwanted side effects [20] the mechanisms of which remain unclear. It has, therefore, long been recognised that to thoroughly understand IFN function and to minimise the side effects of IFN therapies, a more complete understanding of the ISGs is required.

The type I IFN stimulated human 6–16 and ISG12 (herein renamed as ISG12(a)) are ISGs that encode small hydrophobic proteins (Mr 12.9 kDa and 11.5 kDa, respectively). The predicted proteins share 36% overall amino-acid identity and 49% identity over an ~80 amino acid length. Both genes are regulated by type I IFNs in a number of cell lines [2123]. Human 6–16, in particular, is characterised by its high inducibility in response to IFN. In HeLa cells 6–16 mRNA can constitute as much as 0.1% of the total mRNA after IFN stimulation [22]. It is therefore likely that these genes play an important role in the IFN response.

Despite gene disruption [24] and over-expression [25] studies, cellular and/or biochemical roles for the 6–16 and ISG12 gene products have not been identified. One way to address this problem is to identify related genes whose characteristics may provide clues to function.

Here we present in silico data identifying a novel family of genes (the ISG12 gene family) related to human 6–16 and ISG12. Each family member codes for a small, hydrophobic protein that contains a conserved, 80 amino-acid motif. So far, 46 members of this family have been determined in 25 organisms ranging from higher mammals to single celled amoeba. While none of the genes has a known function, identification of this family indicates a number of systems in which gene function can be investigated.

Results and discussion

Identification and characterisation of ESTs related to human 6–16 and ISG12

We have used the human 6–16 (accession number: BN000257) and ISG12 (accession number: BN000225) sequences to conduct an online BLAST search, at the protein level, of EMBL and Genbank databanks and so uncover ~1,500 separate nucleotide sequences (EST, genomic, mRNA etc). Alignment of these sequences allowed subdivision into 46 different transcripts (named according to phylogenetic analysis, below) originating from 25 different species (summarised, Table 1). Clustal W [26] was used to align the ~1,500 nucleotide sequences and generate a consensus sequences for each of the 46 different transcripts. These consensus sequences were then submitted to EMBL as third party annotations (accession numbers shown in Table 1). Putative protein coding sequences were determined using the first ATG rule [27]. Performing a BLAST search with any of these 46 putative proteins uncovers nearly the same set of ~1,500 genes described above. This suggests that there is a highly conserved region shared by these sequences.

Table 1 ISG12 genes found through Blast database searches

Identification of a conserved ISG12 protein motif

An alignment of the 46 putative amino-acid sequences listed in Table 1 was used to identify an ~80 amino-acid ISG12 motif (Figure 1) shared by all. This alignment has been submitted to Pfam (http://www.sanger.ac.uk/Software/Pfam/, Version 10, accession number: PF06140) and represents a newly identified gene family, the ISG12 gene family. Using the ISG12 motif for BLAST searches did not uncover any further family members. All family members encode a single motif except for one of the mouse genes, ISG12(b), which encodes two.

Figure 1
figure 1

Alignment of putative protein sequences identifies a ISG12 protein motif. Putative protein product of the genes in Table 1 were aligned using Clustal W [26] and annotated using Boxshade http://www.ch.embnet.org/software/BOX_form.html. The section of this alignment where the predicted proteins share greatest amino-acid identity is shown. The consensus sequence for this region represents residues that are conserved in 50% or more of the sequences and defines the ISG12 motif (Pfam accession number: PF06140). Black squares represent sequence identity and grey squares represent sequence similarity. Numbers flanking the sequence represent amino-acid numbers in the putative proteins.

Analyses of predicted protein sequences with programme suites such as PredictProtein http://www.embl-heidelberg.de/predictprotein/predictprotein.html or EMBOSS http://www.rfcgr.mrc.ac.uk/Registered/Webapp/emboss-w2h/ did not give many clues as to signalling, structure or function. The proteins identified are hydrophobic (Table 1, Figure 2) raising the possibility that they may be membrane-associated proteins. Transmembrane prediction programmes (TMHMM [28], HMMTOP [29] and SMART [30]) gave varied and therefore inconclusive results (data not shown), although immunocytochemical analysis appears to locate ISG12(a) to the nuclear membrane [25].

Figure 2
figure 2

Predicted protein hydrophobicity. Kyte-Doolittle schematics were formulated using GREASE software http://www.rfcgr.mrc.ac.uk to show the hydrophobicity of predicted protein sequences for mouse ISG12(a), human ISG12(a) and human 6–16. ISG12 motifs are highlighted in yellow.

Regions outside the motif do not share good sequence identity, but are similar in that they are often short, hydrophobic and contain a high percentage of residues with small side groups (A, B, C, D, G, N, P, S, T and V).

Phylogenetic analysis

In order to determine whether the 46 genes identified form particular subgroups, phylogenetic analysis was performed in higher mammals.

The mammalian sequences from Figure 1 were used to compose a parsimonious tree using Dictyostelium discoideum as a distant relative for the out-grouping of the tree (Figure 3). Four tentative clades were identified in this way: 6–16, ISG12a, ISG12b and ISG12c. Bootstrapping shows that, where the 6–16 genes have stable branches, the remaining genes cannot be as stringently divided in this way. Only genes in closely related species seem to give stable branching points (i.e. mouse and rat ISG12a and ISG12b's) suggesting that the ISG12 gene-products are less uniformly divergent than the 6–16 gene products.

Figure 3
figure 3

Phylogenetic analysis of the ISG12 family. Maximum parsimony tree with bootstrap confidence levels based on putative protein coding sequence of higher mammals using D. discoideum as an out-group (see materials and methods). Two stars represent a bootstrap confidence level >85%, one star >60%.

Gene organisation

To clarify the grouping of family members, BLAST was used to identify ISG12 genomic sequences. Aligning mRNA and genomic DNA sequence revealed intron/exon structure and, for organisms with sufficiently complete genome sequences (human, mouse, rat), chromosomal locations of ISG12 family members could be identified. The results for human and mouse genes are summarised in Figure 4. The human, mouse and rat (not shown) ISG12 genes cluster at syntenic loci (14q32, 12F1 and 6q32, respectively). By identifying conserved genes immediately flanking these clusters (ATP-dependent RNA Helicase DDX24 (Dead-Box Protein 24) and Heat-Like Repeat-containing protein isoform 1), the mouse and human loci could be correctly aligned, simplifying comparison of ISG12 gene organisation between the two species. The relative positions and opposing orientations of hISG12(a) and hISG12(b) are matched in the mouse locus by mISG12(a) and mISG12(b1)/mISG12(b2). This, plus the conserved intron/exon arrangement in hISG12(b), mISG12(b1) and the two halves of mISG12(b2), suggests that hISG12(a) and mISG12(a) are orthologues, and that the mISG12(b1) and mISG12(b2) genes arose from an ancestral murine orthologue of hISG12(b) by two gene duplications and one gene fusion. This leaves only hISG12(c) whose orientation is consistent with it having arisen by duplication of hISG12(a). Thus the phylogenetic relationship of the ISG12 genes, reflected in their assigned suffixes (a, b, b1 etc), is supported by the analysis of gene structure.

Figure 4
figure 4

Genomic organisation of ISG12 genes in humans and mice showing ISRE positions. Regions of human chromosomes 1p35 and 14q32, and mouse chromosome 12F1, carrying ISG12 genes are shown. Transcribed regions are coloured (thickened lines, introns; boxes, exons) and arrows represent direction of transcription. Translational initiation and termination sites are indicated by green and red circles, respectively. Exons encoding an ISG12 motif are starred. The positions of numbered ISRE sequences (as defined in Table 2) are indicated. Orientations relative to Telomeres (TEL) and Centromeres (CEN) are indicated. The regions shown can be accessed from the Ensembl website using the following addresses: http://www.ensembl.org/Homo_sapiens/contigview?chr=1&vc_start=27575333&vc_end=27625333&highlight=ENSG00000126709; http://www.ensembl.org/Homo_sapiens/contigview?chr=14&vc_start=92535062&vc_end=92590000&x=0&y=0; http://www.ensembl.org/Mus_musculus/contigview?chr=12&vc_start=97445000&vc_end=97475000&x=38&y=12.

Table 2 ISREs found at human and mouse ISG12 gene loci

Where intron/exon structure is available, intron position was marked on an alignment (as in Figure 1) of predicted protein sequences (Figure 5). Intron site conservation at the N and C termini of the motif is much more pronounced than elsewhere. This is consistent with the possibility that the motif represents a structural domain that is evolutionarily conserved while being placed in different sequence contexts by exon shuffling. Stuctural analyses will be required to test this possibility.

Figure 5
figure 5

Alignment of introns in ISG12 amino-acid sequences. Predicted ISG12 protein sequences, for genes whose intron/exon structures are available, are aligned as in Figure 1, with the ISG12 motifs highlighted in yellow. Positions of introns lie between the amino acids that have been marked (⇓).

We can postulate, then, that the ISG12 family arose from an ancestral gene that underwent an initial gene duplication event to form ISG12(a) and ISG12(b). This event probably happened between the emergence of amoeba and divergence of fish, judging by the identification of only one ISG12 in simple eukaryotes, such as Dictyostelium, and multiple ISG12's in mammals, fish and birds. The 6–16 clade appears to have arisen by interchromosomal duplication just before the divergence of the ungulates and primates. The ISG12(b2) and ISG12(c) genes probably arose relatively recently (phylogenetic evidence suggests that cow b2 and mouse b2 are probably not orthologues (Figure 3)) as these have not been found in other organisms. That the ISG12 motif has been found in organisms that do not host the IFN signalling pathway indicates that the IFN responsiveness seen in human 6–16 and ISG12(a) arose later in evolution. There may be a basic function shared by all family members that has been co-opted in higher organisms to become part of the IFN response.

IFN responsiveness of ISG12 transcripts

The finding that the ISG12 motif occurs in simple eukaryotes suggests that not all ISG12 genes in higher organisms have necessarily been incorporated into the IFN response. We therefore looked at the number and fidelity of Interferon Stimulated Response Elements (ISREs; RGGAAA NNGAAACT) [23] in the vicinity of the human and mouse ISG12 genes (Figure 4, Table 2).

Semi-quantitative, RT-PCR analysis was used to determine whether the human and mouse ISG12 genes were responsive to type I IFN (Figure 6). This confirmed that, as previously shown [21, 22], human 6–16 and ISG12(a) are highly IFN-responsive in the human fibrosarcoma HT1080 cell line. As might be expected from the positioning of ISRE's so close to exon 1 (Figure 4), mouse ISG12(a) is also IFN-responsive, though much less strongly (Figure 6) than human ISG12(a). Mouse ISG12(b1) and (b2) are IFN-responsive in the fibroblast cell line L-929, but human ISG12(b) and (c) were not induced in HT1080 cells despite the proximity of putative ISRE sequences. Figure 6 does show that all the genes tested are transcribed in the cell lines tested and does not preclude IFN-stimulation in other cell lines of those genes so far not found to be inducible.

Figure 6
figure 6

Expression of human and mouse ISG12 genes in cell lines. Transcripts for the indicated human and mouse ISG12 genes were detected by RT-PCR. RNA was isolated from the human HT1080 cells (a) or mouse L-929 cells (b) treated with (left hand panels) or without (right hand panels) type I IFN for 24 h. The indicated serial five-fold dilutions of reverse transcripts were analysed by PCR. Most PCR assays included, as an internal control, primers for beta-actin (β-Ac). Diagnostic products for each transcript are arrowed. M = size markers.

Conclusions

Using EMBL and Genbank searches, we have been able to compile a family of 46 genes related to the human 6–16 and human ISG12(a) genes. Aligning all 46 genes reveals an ~80 amino-acid motif (the ISG12 motif) that is shared between genes in species as diverse as amoeba and humans. The 46 genes identified code for highly hydrophobic, potentially membrane-embedded proteins and fall into four main groups; 6–16, ISG12(a), ISG12(b) and ISG12(c). These four distinct gene groups seem to have been derived through gene duplication and divergent evolution, with all genes, other than 6–16, remaining clustered in syntenic loci.

The existence of a member of the ISG12 gene family in Dictyostelium discoideum, which does not possess the IFN system, combined with evidence that some family members in higher eukaryotes are not IFN-responsive (at least in some cell types), suggests that IFN stimulation plays an ancillary role in ISG12 gene function, and is not a defining characteristic. Further work is required to identify any unifying biochemical or cellular function for the ISG12 family. One such function may be as part of a response to cellular or environmental stress. This would certainly encompass those family members that have become part of the IFN system, which is itself a response cellular insults such as viral infection and oxidative stress, while allowing for the possibility that other family members contribute to combating cellular stress independently of the IFN system. In organisms that have multiple family members, functional redundancy may complicate genetic analyses of ISG12 gene function, and multiple gene-knockouts or knockdowns may be required to reveal a clear phenotype. Members of the ISG12 gene family were not found in common laboratory model organisms such as fruit flies or nematode worms, despite searching complete genomes with the motif. However, studies of simpler organisms, such as the slime mould D. discoideum, with only one ISG12 gene, may provide a useful alternative approach.

Methods

Database searches and sequence alignments

The Genbank and EMBL databases were screened using the online BLAST [31, 32] server at http://www.ncbi.nlm.nih.gov/BLAST and http://menu.rfcgr.mrc.ac.uk/cgi-bin/blast (authorization required) respectively. Searches were performed at both the nucleotide (blastn) and amino acid (blastp) levels.

Sequences were aligned using Clustal W [26] in the MAGI (Multiple Alignment General Interface) suit at http://menu.hgmp.mrc.ac.uk/menu-bin/MAGI/magi (authorization required) and then manipulated for presentation using Boxshade at http://www.ch.embnet.org/software/BOX_form.html.

Putative proteins for mouse ISG12(a), human ISG12(a) and human 6–16 were used for basic in silico protein structure analysis using the programmes HMMTOP [29], SMART [30] and TMHMM [28] to give prediction of transmembrane helices and hydrophobicity plots.

Phylogenetic analysis

Putative amino acid sequences of genes found above were aligned using Clustal W as above. Phylogenetic analyses of the alignment were conducted using Protpar on the Phylip package [33] through the PIE (Phylogeny Interface Environment) suit at http://www.hgmp.mrc.ac.uk/Registered/Webapp/pie/. A maximum parsimony tree with branch confidence values based on 1000 bootstrap replicates was constructed. The putative protein coding sequence for D. discoideum was used as an out-grouping species. The tree was then annotated with bootstrap confidence levels (* = > 60 %, ** = > 85 %).

Cell lines

The human cell lines HT1080 (ATCC#CCL-121) and HEK293 (ATCC#CCL-1573) and mouse cell lines L-929 (ATCC#CCL-1) were cultured as monolayers in 1 × Dulbecco's modified eagle medium (GIBCO) supplemented with heat inactivated FCS (10 % (v/v), Globepharm), L-glutamine (2 mM, GIBCO), non-essential amino acids (0.4 mM, GIBCO), sodium pyruvate (1 mM, GIBCO) and antibiotics penicillin/streptomycin (100 Uml-1 and 100 μgml-1 respectively, GIBCO). The cells were grown to ~40% confluency and then induced using type I IFN (AD hybrid IFN, 200 IUml-1, 24 hours).

RT-PCR of ISG12 transcripts

RNA was prepared from cultured cells as described [23] and used (2 μg) with oligo(dt)18 primer to synthesis first-strand cDNA (Promega).

PCR reactions (25 μl) were set up with dNTPs (0.25 mM, Pharmacia), experiment primers (100 ng each, Genosys), control primers (as described, 18s QuantumRNA, Ambion), 1 × reaction buffer (Qiagen), Taq polymerase (1.25 U, Qiagen) and template (1 μl of 25 μl cDNA-synthesis reaction). The following oligonucleotides were used as primers; mISG12(a)f: 5'-GGTGTGTCTTCCTGCACAGTGG-3'; mISG12(a)b: 5'-GGCAATATGTGTTAGGAGATTGTCG-3'; mISG12(b1)f: 5'-TTGCCAATGGAGGTGGAGTTGCAG-3'; mISG12(b1)b: 5'-ATCAGTGAGGGTTCTGAAGGTGCC-3'; mISG12(b2)f: 5'-CCATAGCAGCCAAGATGATGTCTG-3'; mISG12(b2)b: 5'-TTGCCACACCAACAAACCATC-3'; hISG12(a)f: 5'-TCTCACCTCATCAGCAGTGACCAG-3'; hISG12(a)b: 5'-CCTCTGGAGATGCAGAATTTGG-3' hISG12(b)f: 5'-GTAACACCCCAAGAACGCTGTC-3'; hISG12(b)b: 5'-GCATCTGCATGTGACCTTTATTCC-3' hISG12(c)f: 5'-GCACCTCCTCTTACAGCTTTACTCC-3'; hISG12(c)b: 5'-GGAGACTTGTCCTTTGGAAGATTG-3'; h6–16f: 5'-GATTGCTTCTCTTCTCTCCTCCAAG-3'; h6–16b: 5'-TCGAGATACTTGTGGGTGGCGTAG-3'.

Non-saturating, duplex PCR was performed under the following conditions: 1 cycle (4 mins, 94°C), 28 cycles (30 s, 94°C; 30 s, 60°C; 45 s, 72°C), 1 cycle (5 mins, 72°C). Products were analysed by agarose gel electrophoresis.