Background

Ornithine Decarboxylase Antizymes are important negative regulators of cellular polyamine levels. In mammals, antizyme-1 inhibits ornithine decarboxylase (ODC), an enzyme catalyzing the first and rate-limiting step in polyamine biosynthesis. Antizyme-1 binds to ODC and targets it for ubiquitin-independent degradation by the 26S proteosome in a multiple-turnover manner (a single antizyme molecule can cause degradation of several ODC molecules) [1, 2]. Additionally, antizyme-1 regulates the intracellular concentration of polyamines by inhibiting cellular import of polyamines and accelerating polyamine export from the cell [35]. While genomes of lower eukaryotes contain single antizyme genes, multiple paralogs have evolved in higher eukaryotes, with at least two antizymes in vertebrates [6, 7], three in mammals [8, 9] and up to five in certain fish species [10]. Antizyme paralogs vary somewhat in their function, although all are implicated in the regulation of polyamine synthesis (and some are reported to link with other pathways [11, 12]). Antizyme paralogs usually have a distinct expression pattern with certain paralogs being expressed in a strictly restrictive tissue-specific manner, such as testis-specific mammalian antizyme 3 [8, 9] or retina and brain specific antizyme AZR from Danio rerio [13]. Reviews of antizyme function and distribution are available [10, 14, 15].

Given the important role that antizymes play in the regulation of polyamine concentrations, it is not surprising that their own biosynthesis is regulated in response to changes of cellular polyamine concentrations. Polyamines' concentrations are sensed during the elongation stage of antizyme mRNA translation. Unlike the great majority of CDS-es, that for virtually all eukaryotic antizymes consists of two overlapping open reading frames. Synthesis of full-length antizyme protein requires a portion of translating ribosomes to switch translation phase at the end of the first ORF into the partially overlapping ORF (in +1 translation phase) in a process termed programmed ribosomal frameshifting [16]. The portion of ribosomes that do not shift frames, terminate at the end of the first ORF with release of relatively short encoded polypeptide. Increases in cellular polyamine levels result in elevated frameshifting efficiency and so of synthesis of fully functional antizyme. The competition between frameshifting and termination at the end of the first ORF is a sensor of polyamine concentration that provides an elegant mechanism for regulatory negative feedback (Figure 1A).

Figure 1
figure 1

Scheme of negative regulatory feedback in regulation of antizyme synthesis and conservation of the frameshift site. A. The competition between ribosomal frameshifting and standard translation (termination) is sensitive to polyamine levels. An increase in polyamine concentrations shifts the competition toward frameshifting resulting in elevation of antizyme synthesis and consequent inhibition of polyamine synthesis and uptake. A decrease in polyamine concentrations shifts the competition towards standard translation and produces the opposite effect on the synthesis of antizyme and polyamines. B. WebLogo representing the alignment of 153 OAZ sequences. The last codon in the zero frame and the first codon -in +1 frame are indicated by red and blue bars respectively. It can be seen that the only universal nucleotide of the frameshift site is T (U in mRNA) corresponding to the first position of the stop codon at the end of the first ORF.

The +1 frameshifting event during antizyme biosynthesis significantly complicates automatic detection of its full-length CDS in mRNA. This is due to the lack of reliable and efficient algorithms for predicting ribosomal frameshifting locations. A number of attempts have been made recently to develop computational approaches for predicting instances of the ribosomal frameshifting [1722]. Some of these approaches could be useful for detecting candidate sequences that are prone to efficient (not necessarily programmed) frameshifting within particular groups of organisms [1719, 23]. However, they are not suitable for reliable detection of programmed ribosomal frameshifting events without experimental verification or additional expert human involvement. The reasons underlying the consistent failure to develop highly accurate algorithms for ribosomal frameshifting prediction lie in the very nature of programmed ribosomal frameshifting. The efficiency of ribosomal frameshifting is modulated by highly diverse sequence elements many of which evolved independently. The mechanisms by which such elements alter translation also vary considerably. The situation is further complicated by differences in the translation machinery (sequences of ribosomal components, differences in tRNAs properties and their relative concentrations) across different organisms, leading to a situation where the same sequence is shift-prone in one organism, but in another it is accurately translated in a standard triplet-manner. Therefore, it is not possible to find even a single nucleotide sequence feature that would specify a site of ribosomal frameshifting universal for all organisms. Information regarding the diversity of genes utilizing programmed ribosomal frameshifting for their expression as well as multifarious sequences modulating frameshifting process is available at the Recode database, which is currently the richest Internet resource [24, 25], as well as, comprehensive literature reviews on this and related topics [2635]. In fact, currently antizyme mRNAs themselves are the most plentiful source of diverse frameshift stimulator signals as evident from the recent detailed review covering nearly three hundred antizyme mRNA sequences [10]. A collection of sequences described in that review was used here for the design of OAF (Additional file 1).

It appears that approaches to predict frameshifting specifically for particular clusters of related genes produce more reliable results. Such approaches were applied for -1 frameshifting involved in the synthesis of viral polyproteins [21], different types of frameshifting events in decoding bacteriophage tail assembly genes [20], and +1 frameshifting during the synthesis of bacterial release factors 2 [22]. Indeed ribosomal frameshifting utilized by a group of homologous genes likely has the same origin. While evolution introduces organism specific alterations in the sequence of the frameshifting cassette, as well as, diversifying protein sequence, a detectable degree of similarity is frequently recognizable. Though existence of such similarity may not be a universal rule (as evident with the frameshifting utilized in decoding bacteriophage tail assembly genes [20] where only genomic localization of overlapping ORFs is conserved), it holds true for many cases. Therefore, knowledge of a few examples of ribosomal frameshifting from homologous genes can be sufficient for designing algorithms for automatic and accurate prediction of ribosomal frameshifting utilized in decoding of homologous genes. By dealing with each group of homologous genes utilizing ribosomal frameshifting separately one-by-one, we aim to build a collection of autonomic computer tools capable of automatically predicting most cases of ribosomal frameshifting in newly sequenced organisms. OAF is our second computer tool designed in pursuit of this goal. Our first tool, ARFA detects and annotates the programmed ribosomal frameshifting required for expression of certain bacterial release factors [22]. Both tools will be used for future updates of the Recode database.

Implementation

OAF is written in Perl, it utilizes BioPerl libraries [36]. The OAF Web interface was designed using PHP.

Outline of the analysis performed by OAF

Antizyme mRNAs from different organisms have evolved a remarkable assortment of RNA signals for stimulating or modulating the +1 ribosomal frameshifting used in their expression. Many sequence features are shared among closely related antizyme mRNAs. For example, two distinct types of frameshift-enhancing RNA pseudoknots are embedded in antizyme-1 and antizyme-2 mRNAs from mammals. Nevertheless, not a single feature is universally conserved. Instead of trying to account for known frameshifting stimulators, we have devised an antizyme gene detection scheme based on detection of sequences encoding antizymes. While antizyme protein sequences are highly diverse, there is a reasonable degree of sequence similarity within large phylogenetic groups allowing their detection based on similarity searches. Most importantly, eukaryotic antizyme genes share the same ORF organisation: the upstream ORF is smaller than the downstream ORF and the downstream ORF is always in the +1 translational phase relative to the first one. Therefore our method is based on a search for two overlapping ORFs corresponding to profile HMMs designed using sequences of known antizymes. Mutual orientation of the ORFs is further examined to verify that it corresponds to an expected transition between translational phases. For large sequences (>20 kb), OAF performs an initial FASTA search with relaxed parameters, where a mixture of divergent antizyme sequences is used as a query. This is used to increase OAF speed by reducing the number of candidate sequences for subsequent HMM analysis. Relaxed parameters decrease the chances of losing true positives in this process. The scheme of analyses performed by OAF is illustrated in Figure 2.

Figure 2
figure 2

Scheme of analyses performed by OAF. OAF pipeline. Each step performed by OAF is shown as a box, grey boxes represent external modules utilized by OAF.

Profile HMMs and automatic classification of antizymes

To design profile HMMs exploited by OAF, we used a collection of protein sequences derived from mRNA fragments using manually assembled ESTs. These sequences were described in some detail in a recent antizyme review [10] and are available in this article as an Additional File 1 (manualOAZs.fasta). Evolutionary distances between protein sequences were estimated using a Neighbour-Joining algorithm and poisson correction evolutionary model implemented in MEGA3.1 program [37]. Based on these distances, sequences were clustered into 12 homologous groups for which separate pairs of profile HMMs were designed using HMMER [38]. These HMMs are used to allow discrimination among different antizyme paralogs and to permit approximate estimation of the taxonomic origins of antizyme encoding sequences. The clustering is shown on the tree generated with MEGA3.1 (see Figure 3).

Figure 3
figure 3

OAZ clustering. A circular tree of OAZ sequences representing clustering that has been used to design profile-HMMs used by OAF.

A separate profile HMM is built for the frameshift site itself. This HMM is not used for identification of antizymes or frameshift sites. However a predicted frameshift site is compared to the HMM and corresponding E-score can be reported in the output to facilitate further processing of data such as identification of unusual frameshift sites or detection of sequencing errors disguised as cryptic frameshift sites. Figure 1B illustrates conservation of OAZ frameshift sites as a web logo [39].

OAF I/O interface

Input

There are two types of searches that can be performed by OAF. First a given nucleotide sequence or multiple sequences (either provided in a user's file in a fasta format or as a Genbank accession number) can be analyzed for the presence of antizyme CDS (first two modes in Figure 2). Second (third mode in Figure 2), protein sequences of known antizymes in a user's fasta file can be used as query for a search against a database of nucleotide sequences (either in a local Blast database or in a remote BLAST database at NCBI). A user can specify the genetic code table and usage of alternative initiation codons (by default CDS can start only with ATG/AUG).

Output

OAF reports sequences of encoded antizymes either as raw sequence, or in fasta, genbank or XML format. XML contains detailed information regarding the frameshift site and is compatible with a future version of Recode database. By default, OAF reports all sequences encoding antizymes, even if their ORF organization does not correspond to that for utilization of +1 frameshifting or if only a partial antizyme sequence is found. Such, likely erroneous sequences, can be filtered out automatically.

Web interface

The web interface of OAF (see Availability and Requirements section). It serves mostly illustrative purposes and has limited capabilities compared to a full version of Oaf. Web service allows analysis of a single user-provided sequence for the presence of encoding antizyme.

Results and Discussion

To evaluate OAF prediction sensitivity for genome annotations, the mRNA sequences of 20 completed eukaryotic genomes were downloaded from the RefSeq database [40]. OAF detected 18 OAZ genes (Table 1). No genes encoding antizymes were detected in plant genomes (Table 1). To evaluate OAF prediction selectivity, a random sequence database (totalling 10 Tbp) was generated by a fifth order Markov chains based on six-mer frequencies of each mRNA of the genomic sequences. OAF did not detect any OAZ sequence in this database.

Table 1 OAZ sequences detected in completed genomes

To estimate OAF accuracy on EST sequences, the June 2007 dbEST was used [41]. OAF detected antizyme sequences in 6639 ESTs, among them there are 2067 unique sequences coding for antizyme. Many of these sequences were truncated mRNA fragments that can be grouped as corresponding to the same antizyme mRNA. 24 new antizyme sequences, which were not present in the original dataset (Additional file 1), were detected, see Table 2.

Table 2 New OAZ sequences detected in dbEST

OAF has detected a number of highly similar variant OAZ sequences supported by multiple ESTs corresponding to the same species. Some of these sequences are most likely allelic variants while others correspond to recent gene duplication events. OAZ variants are summarized in Table 3.

Table 3 OAZ sequence variants in dbEST

OAF detected a number of sequences whose OAZ clustering (Figure 3) did not match the taxonomy of the source organisms. These sequences are likely contaminants that were introduced from pests, symbionts, food or cell hosts (see Table 4). Some of these contaminations were previously reported in [10].

Table 4 Contaminant OAZ sequences

Conclusion

We have developed a simple computer utility for identification of OAZ encoding sequences in nucleic acids, called OAF (O DC a ntizyme f inder). It performs with high speed and accuracy on mRNA sequences annotated in completed genomes as well as on raw RNA sequences from EST collections.

Availability and requirements

* Project name: OAF (O rnithine Decarboxylase A ntizyme F inder)

* Project home pages: http://recode.ucc.ie/oaf/

http://recode.genetics.utah.edu/oaf/

* Operating system(s): Platform independent

* Programming language: Perl, PHP

* Other requirements: Mandatory: BioPerl 1.5.1+, FASTA 3.4+, HMMER 2.3.2. Optional (required for searches against local blast databases): NCBI BLAST

* License: CCL

* Any restrictions to use by non-academics: yes, see the home page.